]

Frequently asked questions

General questions

What is the difference between awk, nawk, mawk, gawk etc?

AWK is one of the oldest software tools found in every UNIX‑like operating systems. The first AWK version was implemented as awk. AWK was significantly revised and expanded in 1985‑88, resulting in the GNU AWK or gawk implementation written by Paul Rubin, Jay Fenlason, and Richard Stallman, released in 1988 (maintained by Arnold Robbins since 1994).

Brian Kernighan's nawk (New AWK) source was first released in 1993 unpublicized, and publicly since the late 1990s. mawk is a very fast AWK implementation by Mike Brennan based on a bytecode interpreter (maintained by Thomas Dickey since 2009), tawk (Thompson AWK) is an AWK compiler for Solaris, DOS, OS/2, and Windows, previously sold by Thompson Automation Software (which has ceased its activities). There also exists Jawk, a project to implement AWK in Java etc.

There is no standard of what version of AWK is installed and running in a given system. It's a matter of software packaging and software distribution. It's likely to have more than one version of AWK installed in a system. What version is running under the generic awk name is usually a matter of links and PATH settings. You can always check your AWK version with the ‑‑version option; shell hash command may also help.

Technical questions

How can I define local variables in AWK?

AWK is very loose in variable definition. There is no need to define a variable in AWK, variables come to life whenever they are used for the first time. This seems to be a good thing but it's not, because this way all variables in AWK become global objects and global objects are confusing and buggy. Actually there are two kinds of global objects in AWK: functions and variables.

However, not all variables are global in AWK. Some variables have only function scope, that is they are local to specific functions. These variables are the parameters of a function. Actually, we can make use of this fact to "define" local variables with function scope. The usual practice is to list the desired local variables as function parameters, after the normal function parameters. In order to make this clear to humans, we choose to separate the normal function parameters from the local variables, leaving some space (usually 1‑2 tabs) after the normal function parameters:

...
function total(amount,			tot, i) {
	tot = 0

	for (i in amount)
	tot += amount[i]

	return tot
}
...

In the above AWK snippet we define function total which takes an array of amounts as an argument and returns the sum of the array elements (total amount). In order to iterate over the array elements we need a local variable for the index of the array (i) and another local variable for amount accumulation (tot). These two variables are made local to the function total by just represent them as function parameters, although we'll never pass values for these arguments to the function. Function calls will always take just one argument, the array of the amounts to sum.

If the function does not accept any (normal) argument, then we leave some space before the desired local variables:

...
function function_name(			i, j) {
...

In the above AWK snippet we define variables i and j as local to the function function_name.

In what order is an array scanned/traversed?

By default, when a for loop traverses an array, the order is undefined, meaning that the awk implementation determines the order in which the array is traversed. This order is usually based on the internal implementation of arrays and will vary from one version of awk to the next. Read more in the GNU AWK user's guide relevant section.