Shell Programming/Features


Features edit

As programming languages, shells typically have very limited features: their main power is from calling external commands.

However, there are a few features that are quite powerful and less commonly found or less well-known in other languages:

  • Streams, pipes, and redirection
  • Process management
  • Traps

These are particularly useful on multicore systems, as the processes can often run in parallel.

Streams

Streams allow one to operate on arbitrarily large data (codata) without needing to read it all in at once, progressively reading input or writing output; the most basic streams are standard input and standard output. Most simply this is done by operating on text data one line at a time. Pipes allow one to connect an output stream of one program to an input stream of another, allowing concurrent processing of data. Many of the standard Unix commands operate on streams, particularly for text processing, and are designed for use in pipes – these are sometimes known as filters. A standard idiom for shell scripts is to build a pipeline, connecting these to quickly build sophisticated and relatively efficient programs.

In addition to anonymous pipes, named pipes (aka FIFOs) allow complex interprocess communication (IPC); see NamedPipes.

Process management

Process management allows one to run and control multiple processes asynchronously (without blocking). If a command ends with & the process runs in the background (immediately proceeding to the next command). Note that & is a command separator, like ; – you can write a & b analogously to a; b and you never need to write &;.

Shells can manage processes – and generally should only manage its own children – though in practice only limited facilities are practical:

  • wait for one or more processes to complete: via wait, which can take multiple process IDs;
  • terminate a process, via kill
  • check the status of a process, via kill -0

However, shells do not have access to select(2) or poll(2), so they cannot wait for one of a set of processes to complete; this severely limits what can be done.

In principle it is also possible to send other signals to processes, but this is very rare. See ProcessManagement for extensive details.

To simply run several processes in parallel, use xargs with the -P flag, pexec, or more cleanly GNU parallel (sem is often useful).

Starting a pipeline of connected processes (and their children) creates a process group – the progress group ID is the process ID of the leader process, initially the last process in the pipeline, and a signal can be sent to the whole process group via using kill with a negative argument (remember -- so it is not interpreted as a flag!): kill 123 kills process 123, kill -- -123 kill process group 123. pkill and pgrep allow selecting processes by different fields, such as parent process ID or process name (not recommended, due to fragility). To get the process group ID from a process, use ps. Note that there is no simple way to refer to “a process and its descendents” (pgrep/pkill allow one to select by parent, which finds immediate children, but these do not have a recursive option), and thus killing or sending a signal to all descendents (but not the whole process group) is tricky – see “Best way to kill all child processes”.

Note that shells typically have a concept of a “job” as well, where a “job” is a shorthand for referring to a process group (typically a number starting from 1), and is used by the fg and bg commands. This is typically only useful in interactive sessions – for script using the process group ID (or simply process ID) is simpler and more robust. The concept of process/process group (OS) and job (shell) are easily confused, so process management is often referred to as job control even when it is not interactive.

Traps

Complementary to sending signals, shell scripts can receive signals from other processes and set up handlers to respond to them, using trap. These are primarily used by longer-running scripts, primarily daemons (background processes), to allow them to be cleanly terminated, reloaded, suspended, or resumed. This is primarily used for cleaning up on receiving KILL, or reloading configuration on receiving HUP. Other signals that can be usefully trapped include TSTP (when a job is suspended or “stopped” by the shell) and CONT (when a job is resumed or “continued”) – this allows explicit suspension and continuation handlers. See SignalTrap: Sending and Trapping Signals for details.