An Awk Primer/Standard Functions

Below is the list of Awk functions. Arguments which can be omitted are in square brackets.

Numerical functions

edit

Numerical functions work with numbers. All of them return a number and have only numerical parameters, or no parameters at all.

  • int(x) returns x rounded towards zero. For example, int(-3.9) returns -3, while int(3.9) returns 3.
  • sqrt(x) returns  .
  • exp(x) returns  .
  • log(x) returns natural logarithm of x.
  • sin(x) returns  , in radians.
  • cos(x) returns  , in radians.
  • atan2(y,x) is similar to the same function in C or C++, see below for more information.
  • rand() returns a pseudo-random number in the [0,1) interval (that is, it is at least 0 and less than 1). If the same program runs more than once, some implementations (i. e. GNU Awk 3.1.8) produce the same series of random numbers, while others (i. e. mawk 1.3.3) each time produce a different series.
  • srand([x]) sets x as a random number seed. Without parameters, it uses time of day to set a seed. It returns the previous seed.

atan2

edit

atan2(y,x) returns the angle  , in radians, such that:

  •  
  •  
  •  

The formulas are

 

String functions

edit

String functions work with strings. All of them have at least one string parameter, which sometimes can be omitted. For most of them, all parameters are strings and/or regular expressions.

Note that in Awk strings, characters are numbered from 1. For example, in the string "cat", the character number 1 is "c", the character number 2 is "a", the character number 3 is "t".

Below, s and t are strings, regexp is a regular expression.

  • length([s]) returns the number of characters in s (in $0 by default).
  • substr(s, m [,n]) returns the substring of s starting from m-th character whose length is n characters. For example, substr("string", 3, 2) returns "ri". If n is omitted, or if n specifies more characters than are left in the string, the function returns substring of s from m-th character to the last character. For negative m or n, the behaviour is undefined. gawk treats negative m or n as zero.
  • split(s, A [,regexp]) splits s into array A of fields, using regexp (FS by default) as a delimiter. If regexp is empty ("" or //), some implementations (i. e. gawk) split it to characters, others (i. e. mawk 1.3.3) return array of one element, which contains the whole string s. Returns the number of fields.
  • sprintf(format [,expression, ..., expression]) - formats the expressions similar to C and C++ function sprintf, returns the result. See wikipedia article for more information.
  • gsub(regexp, s [,t]) - in t ($0 by default), substitutes all matches of regexp by s. Returns the number of substitutions.
  • sub(regexp, s [,t]) - in t ($0 by default), substitutes the first match of regexp by s. If there is no match, does nothing and returns 0, otherwise returns 1.
    • In sub() and gsub(), & in the string s means the whole matched text. Use \& for the literal & . Note that \& should be typed as \\& in order to avoid the backslash escape in Awk strings.
  • index(s, t) - returns the index of the first occurrence of t in s, or 0 if s does not contain t. Example: index("hahaha", "ah") returns 2, while index("hahaha", "foo") returns 0.
  • match(s, regexp) - like index, but seeks a regular expression rather than a string. Also, sets RSTART to the return value, RLENGTH to the length of the matched substring, or -1 if no match. If empty string is matched, RSTART is set to the index of the first character after the match (length(s)+1 if the match is at the end), and RLENGTH is set to 0.
  • tolower(s) - returns the copy of s with uppercase characters turned to lowercase.
  • toupper(s) - returns the copy of s with lowercase characters turned to uppercase.

System function

edit

There is only one system function.

  • system(s) runs the string s as a command. For example, system("ls -l") runs the command "ls -l", which, under Linux or any other Unix-compatible system, prints the current directory in the long format.

GNU Awk extensions

edit

String functions

edit
  • gensub(regexp, s, h [, t]) replaces the h-th match of regexp by s in the string t ($0 by default). For example, gensub(/o/, "O", 3, t) replaces the third "o" by "O" in t.
    • Unlike sub() and gsub(), it returns the result, while the string t remains unchanged.
    • If h is a string starting with g or G, replaces all matches.
    • Like in sub() and gsub(), & in the string s means the whole matched text. Use \& for the literal & . Like before, \& should be typed as \\& in order to avoid the backslash escape in awk strings.
    • Unlike sub() and gsub(), \0 in the string s means the same as &, while \1 ... \9 mean 1-st ... 9-th parenthesized subexpression.
      • Similarly to above, \0 ... \9 should be typed as \\0 ... \\9 for the same reason.

Several examples of using gensub:

  • print(gensub(/o/, "O", 3, "cooperation")) prints cooperatiOn
  • print(gensub(/o/, "O", "g", "cooperation")) prints cOOperatiOn
  • print(gensub(/o+/, "(&)", "g", "cooperation")) prints c(oo)perati(o)n
  • print(gensub(/(o+)(p+)/, "<[\\1](\\2)>", "g", "I oppose any cooperation") prints I <[o](pp)>ose any c<[oo](p)>eration
  • split has an additional optional parameter. If you call it as split(s, A [,regexp]) , it would work as before, which is split the string s into array A of fields, using regexp (FS by default) as a delimiter. But if you call it as split(s, A, regexp, B) it would also fill the array B by the separators. For example, split("s;tr;;ing", A, ";+", B) will set the array A to ("s","tr","ind"), and set the array B to (";", ";;").
  • patsplit(s, A [,regexp [,B]]) is similar to split, but regexp specifies the regular expression for patterns rather than separators. For example, split("s;tr;;ing", A, ";+", B) is roughly similar to split("s;tr;;ing", A, "[a-z]+", B). The default for regexp is FPAT rather than FS.

Array functions

edit

Below, A and B are arrays.

  • length(A) returns the length of A.
  • asort(A[,B]) - if B is not given, discard indices of A and sort its values. The indices of A are replaced by sequential integers starting with 1. If B is given, copies A to B, then sorts B as above, while A remains unchanged. Returns the length of A.
  • asorti(A[,B]) - if B is not given, discard values of A and sorts its indices. The sorted indices become the new values, and sequential integers starting with 1 become the new indices. Like in the previous case, if B is given, copies A to B, then sorts B's indices as above, while A remains unchanged. Returns the length of A.

Other standard functions

edit

GNU Awk also has:

  • time functions
  • bit manipulation functions
  • internationalization functions.

See the man page (man gawk) for more information.