Bash Shell Scripting/Environment

	Caution: Take your time with this section. These concepts are relatively straightforward once you understand them, but they are different in important ways from analogous concepts in other programming languages. Many programmers and systems administrators, including some who are experienced in Bash, find them counter-intuitive at first.

Subshells

In Bash, one or more commands can be wrapped in parentheses, causing those commands to be executed in a "subshell". (There are also a few ways that subshells can be created implicitly; we will see those later.) A subshell receives a copy of the surrounding context's "execution environment", which includes any variables, among other things; but any changes that the subshell makes to the execution environment are not copied back when the subshell completes. So, for example, this script:

#!/bin/bash

foo=bar
echo "$foo" # prints 'bar'

# subshell:
(
  echo "$foo" # prints 'bar' - the subshell inherits its parents' variables
  baz=bip
  echo "$baz" # prints 'bip' - the subshell can create its own variables
  foo=foo
  echo "$foo" # prints 'foo' - the subshell can modify inherited variables
)

echo "$baz" # prints nothing (just a newline) - the subshell's new variables are lost
echo "$foo" # prints 'bar' - the subshell's changes to old variables are lost

prints this:

bar
bar
bip
foo

bar

Tip:

If you need to call a function that modifies one or more variables, but you don't actually want those variables to be modified, you can wrap the function call in parentheses, so it takes place in a subshell. This will "isolate" the modifications and prevent them from affecting the surrounding execution environment. (That said: when possible, it's better to write functions in such a way that this problem doesn't arise to begin with. As we'll see soon, the local keyword can help with this.)

The same is true of function definitions; just like a regular variable, a function defined within a subshell is not visible outside the subshell.

A subshell also delimits changes to other aspects of the execution environment; in particular, the cd ("change directory") command only affects the subshell. So, for example, this script:

#!/bin/bash

cd /
pwd # prints '/'

# subshell:
(
  pwd # prints '/' - the subshell inherits the working directory
  cd home
  pwd # prints '/home' - the subshell can change the working directory
) # end of subshell

pwd # prints '/' - the subshell's changes to the working directory are lost

prints this:

/
/
/home
/

	Tip: If your script needs to change the working directory before running a given command, it's a good idea to use a subshell if possible. Otherwise it can become hard to keep track of the working directory when reading a script. (Alternatively, the `pushd` and `popd` built-in commands can be used to similar effect.)

An exit statement within a subshell terminates only that subshell. For example, this script:

#!/bin/bash
( exit 0 ) && echo 'subshell succeeded'
( exit 1 ) || echo 'subshell failed'

prints this:

subshell succeeded
subshell failed

Like in a script as a whole, exit defaults to returning the exit status of the last-run command, and a subshell that does not have an explicit exit statement will return the exit status of the last-run command.

Environment variables

We have already seen that, when a program is called, it receives a list of arguments that are explicitly listed on the command line. What we haven't mentioned is that it also receives a list of name-value pairs called "environment variables". Different programming languages offer different ways for a program to access an environment variable; C programs can use getenv("variable_name") (and/or accept them as a third argument to main), Perl programs can use $ENV{'variable_name'}, Java programs can use System.getenv().get("variable_name"), and so forth.

In Bash, environment variables are simply made into regular Bash variables. So, for example, the following script prints out the value of the HOME environment variable:

#!/bin/bash
echo "$HOME"

The reverse, however, is not true: regular Bash variables are not automatically made into environment variables. So, for example, this script:

#!/bin/bash
foo=bar
bash -c 'echo $foo'

will not print bar, because the variable foo is not passed into the bash command as an environment variable. (bash -c script arguments… runs the one-line Bash script script.)

To turn a regular Bash variable into an environment variable, we have to "export" it into the environment. The following script does print bar:

#!/bin/bash
export foo=bar
bash -c 'echo $foo'

Note that export doesn't just create an environment variable; it actually marks the Bash variable as an exported variable, and later assignments to the Bash variable will affect the environment variable as well. That effect is illustrated by this script:

#!/bin/bash
foo=bar
bash -c 'echo $foo' # prints nothing
export foo
bash -c 'echo $foo' # prints 'bar'
foo=baz
bash -c 'echo $foo' # prints 'baz'

The export command can also be used to remove a variable from an environment, by including the -n option; for example, export -n foo undoes the effect of export foo. And multiple variables can be exported or unexported in a single command, such as export foo bar or export -n foo bar.

It's important to note that environment variables are only ever passed into a command; they are never received back from a command. In this respect, they are similar to regular Bash variables and subshells. So, for example, this command:

#!/bin/bash
export foo=bar
bash -c 'foo=baz' # has no effect
echo "$foo" # print 'bar'

prints bar; the change to $foo inside the one-line script doesn't affect the process that invoked it. (However, it would affect any scripts that were called in turn by that script.)

If a given environment variable is desired for just one command, the syntax var=value command may be used, with the syntax of a variable assignment (or multiple variable assignments) preceding a command on the same line. (Note that, despite using the syntax of a variable assignment, this is very different from a normal Bash variable assignment, in that the variable is automatically exported into the environment, and in that it only exists for the one command. If you want avoid the confusion of similar syntax doing dissimilar things, you can use the common Unix utility env for the same effect. That utility also makes it possible to remove an environment variable for one command — or even to remove all environment variables for one command.) If $var already exists, and it's desired to include its actual value in the environment for just one command, that can be written as var="$var" command.

An aside: sometimes it's useful to put variable definitions — or function definitions — in one Bash script (say, header.sh) that can be called by another Bash script (say, main.sh). We can see that simply invoking that other Bash script, as ./header.sh or as bash ./header.sh, will not work: the variable definitions in header.sh would not be seen by main.sh, not even if we "exported" those definitions. (This is a common point of confusion: export exports variables into the environment so that other processes can see them, but they're still only seen by child processes, not by parents.) However, we can use the Bash built-in command . ("dot") or source, which runs an external file almost as though it were a shell function. If header.sh looks like this:

foo=bar
function baz ()
{
  echo "$@"
}

then this script:

#!/bin/bash
. header.sh
baz "$foo"

will print 'bar'.

Scope

We have now seen some of the vagaries of variable scope in Bash. To summarize what we've seen so far:

Regular Bash variables are scoped to the shell that contains them, including any subshells in that shell.
- They are not visible to any child processes (that is, to external programs).
- If they are created inside a subshell, they are not visible to the parent shell.
- If they are modified inside a subshell, those modifications are not visible to the parent shell.
- This is also true of functions, which in many ways are similar to regular Bash variables.
Function-calls are not inherently run in subshells.
- A variable modification within a function is generally visible to the code that calls the function.
Bash variables that are exported into the environment are scoped to the shell that contains them, including any subshells or child processes in that shell.
- The export built-in command can be used to export a variable into the environment. (There are other ways as well, but this is the most common way.)
- They differ from non-exported variables only in that they are visible to child processes. In particular, they are still not visible to parent shells or parent processes.
External Bash scripts, like other external programs, are run in child processes. The . or source built-in command can be used to run such a script internally, in which case it's not inherently run in a subshell.

To this we now add:

Bash variables that are localized to a function-call are scoped to the function that contains them, including any functions called by that function.
- The local built-in command can be used to localize one or more variables to a function-call, using the syntax local var1 var2 or local var1=val1 var2=val2. (There are other ways as well — for example, the declare built-in command has the same effect — but this is probably the most common way.)
- They differ from non-localized variables only in that they disappear when their function-call ends. In particular, they still are visible to subshells and child function-calls. Furthermore, like non-localized variables, they can be exported into the environment so as to be seen by child processes as well.

In effect, using local to localize a variable to a function-call is like putting the function-call in a subshell, except that it only affects the one variable; other variables can still be left non-"local".

	Tip: A variable that is set inside a function (either via assignment, or via a for-loop or other built-in command) should be marked as "local" using the built-in command `local`, so as to avoid accidentally affecting code outside the function, unless it is specifically desired that the caller see the new value.

It's important to note that, although local variables in Bash are very useful, they are not quite as local as local variables in most other programming languages, in that they're seen by child function-calls. For example, this script:

#!/bin/bash

foo=bar

function f1 ()
{
  echo "$foo"
}

function f2 ()
{
  local foo=baz
  f1 # prints 'baz'
}

f2

will actually print baz rather than bar. This is because the original value of $foo is hidden until f2 returns. (In programming language theory, a variable like $foo is said to be "dynamically scoped" rather than "lexically scoped".)

One difference between local and a subshell is that whereas a subshell initially takes its variables from its parent shell, a statement like local foo immediately hides the previous value of $foo; that is, $foo becomes locally unset. If it is desired to initialize the local $foo to the value of the existing $foo, we must explicitly specify that, by using a statement like local foo="$foo".

When a function exits, variables regain the values they had before their local declarations (or they simply become unset, if they had previously been unset). Interestingly, this means that a script such as this one:

#!/bin/bash

function f ()
{
  foo=baz
  local foo=bip
}

foo=bar
f
echo "$foo"

will actually print baz: the foo=baz statement in the function takes effect before the variable is localized, so the value baz is what is restored when the function returns.

And since local is simply an executable command, a function can decide at execution-time whether to localize a given variable, so this script:

#!/bin/bash

function f ()
{
  if [[ "$1" == 'yes' ]] ; then
    local foo
  fi
  foo=baz
}

foo=bar
f yes # modifies a localized $foo, so has no effect
echo "$foo" # prints 'bar'
f # modifies the non-localized $foo, setting it to 'baz'
echo "$foo" # prints 'baz'

will actually print

bar
baz