An Awk Primer/A Note on Awk in Shell Scripts

An Awk Primer

Awk is an excellent tool for building UNIX/Linux shell scripts, but there are potential pitfalls. Say we have a scriptfile named "testscript", and it takes two filenames as parameters:

   testscript myfile1 myfile2

If we're executing Awk commands from a file, handling the two filenames isn't very difficult. We can initialize variables on the command line as follows:

   cat $1 $2 | awk -f testscript.awk f1=$1 f2=$2 > tmpfile

The Awk program will use two variables, "f1" and "f2", that are initialized from the script command line variables "$1" and "$2".

Where this measure gets obnoxious is when we are specifying Awk commands directly, which is preferable if possible since it reduces the number of files needed to implement a script. The problem is that "$1" and "$2" have different meanings to the scriptfile and to Awk. To the scriptfile, they are command-line parameters, but to Awk they indicate text fields in the input.

The handling of these variables depends on how Awk print fields are defined—either enclosed in double-quotes (" ") or in single-quotes (' '). If we invoke Awk as follows:

   awk "{ print \"This is a test: \" $1 }" $1

—we won't get anything printed for the "$1" variable. If we instead use single-quotes to ensure that the scriptfile leaves the Awk positional variables alone, we can insert scriptfile variables by initializing them to variables on the command line:

   awk '{ print "This is a test: " $1 " / parm2 = " f  }' f=$2 < $1

This provides the first field in "myfile1" as the first parameter and the name of "myfile2" as the second parameter.

Remember that Awk is relatively slow and clumsy and should not be regarded as the default tool for all scriptfile jobs. We can use "cat" to append to files, "head" and "tail" to cut off a given number of lines of text from the front or back of a file, "grep" or "fgrep" to find lines in a particular file, and "sed" to do search-replaces on the stream in the file.