An Awk Primer/Awk Program Example

A Large Program

edit

All this is fun, but each of these examples only seems to nibble away at "coins.txt". Why not have Awk figure out everything interesting at one time?

The immediate objection to this idea is that it would be impractical to enter a lot of Awk statements on the command line, but that's easy to fix. The commands can be written into a file, and then Awk can execute the commands from that file.

awk -f awk program file name

Given the ability to write an Awk program in this way, then what should a "master" "coins.txt" analysis program do? Here's one possible output:

Summary Data for Coin Collection:

    Gold pieces:                   nn
    Weight of gold pieces:         nn.nn
    Value of gold pieces:        nnnn.nn

    Silver pieces:                 nn 
    Weight of silver pieces:       nn.nn
    Value of silver pieces:      nnnn.nn

    Total number of pieces:        nn  
    Value of collection:         nnnn.nn

The "Master" Program

edit

The following Awk program generates this information:

# This is an awk program that summarizes a coin collection.
/gold/    { num_gold++; wt_gold += $2 }                           # Get weight of gold.
/silver/  { num_silver++; wt_silver += $2 }                       # Get weight of silver.
END {
    val_gold = 485 * wt_gold;                                     # Compute value of gold.
    val_silver = 16 * wt_silver;                                  # Compute value of silver. 
    total = val_gold + val_silver;

    print "Summary data for coin collection:";
    printf("\n");                                                 # Skips to the next line.
    printf("    Gold pieces:\t\t%4i\n", num_gold);
    printf("    Weight of gold pieces:\t%7.2f\n", wt_gold);
    printf("    Value of gold pieces:\t%7.2f\n", val_gold);
    printf("\n");
    printf("    Silver pieces:\t\t%4i\n", num_silver);
    printf("    Weight of silver pieces:\t%7.2f\n", wt_silver);
    printf("    Value of silver pieces:\t%7.2f\n", val_silver);
    printf("\n");
    printf("    Total number of pieces:\t%4i\n", NR);
    printf("    Value of collection:\t%7.2f\n", total);
}

This program has a few interesting features:

  • Comments can be inserted in the program by preceding them with a #. Awk ignores everything after #.
  • Note the statements num_gold++ and num_silver++. C programmers should understand the ++ operator; those who are not can be assured that it simply increments the specified variable by one. There is also a -- that decrements the variable.
  • Multiple statements can be written on the same line by separating them with a semicolon (;). Semicolons are optional if there is only one statement on the line.
  • Note the use of the printf statement, which offers more flexible printing capabilities than the print statement.
The printf Statement

printf has the general syntax:

printf("<format_code>", <parameters>)

Special Characters:

  • \n New line
  • \t Tab (aligned spacing)

Format Codes:

  • %i or %d Integer
  • %f Floating-point (decimal) number
  • %s String

The above description of printf is oversimplified. There are many more codes and options, which will be discussed later.

There is one format code for each of the parameters in the list. Each format code determines how its corresponding parameter will be printed. For example, the format code %2d tells Awk to print a two-digit integer number, and the format code %7.2f tells Awk to print a seven-character floating-point number, including two digits to the right of the decimal point. The decimal point is included as one of the seven characters.

Note also that, in this example, each string printed by printf ends with a \n, which is a code for a newline (ASCII line-feed code). Unlike the print statement, which automatically advances the output to the next line when it prints a line, printf does not automatically advance the output, and by default the next output statement will append its output to the same line. A newline forces the output to skip to the next line.

The tabs created by \t align the output to the nearest tab stop, usually 8 spaces apart. Tabs are useful for creating tables and neatly aligned outputs.

Running the Program

edit

I stored this program in a file named "summary.awk", and invoked it as follows:

awk -f summary.awk coins.txt

The output was:

Summary data for coin collection:

    Gold pieces:		   9
    Weight of gold pieces:	   6.10
    Value of gold pieces:	2958.50

    Silver pieces:		   4
    Weight of silver pieces:	  12.50
    Value of silver pieces:	 200.00

    Total number of pieces:	  13
    Value of collection:	3158.50

Practice

edit

So far you have enough information to make good use of Awk. The next chapter provides a much more complete description of the language.

  1. Modify the above program to tally and display the countries from "coins.txt".
  2. Write a program that counts and displays the number of blank and non-blank lines (use NF).
  3. Modify the program from #2 to count the average number of words per line.