AWK by example

Perhaps, the easiest way to learn AWK is by reading AWK code written by expert programmers. Most of AWK programs are simple and easy to read, even for the newbies. There are no heavy frameworks to learn, nor difficult APIs to grasp for understanding existing AWK programs written by experienceed programmers. In the following pages you can read some interesting AWK programs and after a while you may find yourself writing your own AWK programs.

AWK scripting

After more than 30 years of successful career, AWK has been proved to be one of the most frequently and heavily used programs, especially on UNIX/Linux systems, either as an everyday tool or embedded in shell scripts or other programs. Every dignified developer or administrator knows (or must know) how to use AWK, though not all of AWK users are AWK experts. The truth is that one does not have to be an AWK expert to make productive use of AWK program. For example, a mail server administrator can produce almost every kind of report on mail system logs without AWK senior level expertise needed; just a little AWK knowledge suffices to write useful programs.

But why AWK is so easy to grasp where other similar programs fail to do so? That's an interesting question with a more interesting answer. AWK is written by three of the most brilliant computer scientists of their times, nameley Brian Kernighan, Alfred Aho and Peter Winberger. AWK was not the first neither by all means the only program developed by these three people; on the contrary, before doing so they had already developed a vast amount of the most difficult software, from compilers and device drivers to operating system kernels, leading software engineering to the modern era.

Through their tremendous experience in software engineering, AWK creators knew that to make it work you keep it simple. They also knew that primature optimisation is the root of all evil, as has been said by Donald Knuth. So it's a clear fact that, Kernighan, Aho and Winberger had in their disposal deep knowledge on almost every aspect of computer science when the idea of a multi purpose programmable pattern/action software tool led them to AWK.

On the other hand, the designers of AWK knew better than anybody else what to expect from AWK, because they developed AWK to be used for their own needs and purposes and not to cover universal hypothetical needs. Since then, any missing feature has been added when needed, leading from awk to nawk and eventually to gawk, which is the gorilla of the various AWK implementations.

AWK in shell scripting

AWK has been used in shell scripting from the very beginning of its existence in the late 1970s. Almost every UNIX/Linux shell script invloves awk calls in the code, one or more times. There is no magic about why awk is so heavily used in shell scripting; the fact that people created awk did not try to construct the super magic tool to handle every kind of need in computer science, but just tried to make their own lifes easier in everyday hacking, is the answer to that question.

Aho, Weinberger and Kernighan were three of the best programmers of their times, so they knew exactly how to devide and conquere, that means they knew how to split huge tasks to smaller ones in order to carry out the most difficult computer programs by designing robust software in small and manageable pieces, most of them carrying out the same kind o job repeatedly again and again. These small sofware pieces are named software tools, with awk being one of the most frequently used. Other useful software tools are sort, join, sed etc, but only few of them are programmable like awk does. Anyway, that's enough talking, let's dive in easy shell scripting with awk assistance.

procrustes.sh

Our first shell script, procrustes.sh, is about filtering input data according to the value of one of the input columns. Input is checked against minimum and maximum values, and ony lines between those two limits are printed.

We use options -m and -M to specify min and max, while option -c is used to specify the column to check (default 1). If we like to use separators other than spaces and tabs, then we can use -t option to specify another separator.

#!/usr/bin/env bash

usage() {
	echo "usage: ${progname} [-m min] [-M max] [-c column] [-t separator] [files...]" >&2
	exit 1
}

progname=$(basename $0)
errs=
min=
max=
col=1
sep=

while getopts "m:M:c:t:" opt
do
	case "${opt}" in
	m)
		min="${OPTARG}"
		;;
	M)
		max="${OPTARG}"
		;;
	c)
		col="${OPTARG}"
		;;
	t)
		sep="${OPTARG}"
		;;
	\?)
		errs="yes"
		;;
	esac
done

[ -n "${errs}" ] && usage

shift $(expr ${OPTIND} - 1)

exec awk -v min="${min}" -v max="${max}" -v col="${col}" -v sep="${sep}" 'BEGIN {
	if (sep != "") {
		FS = sep
	}
}
NF < col { next }
(min != "") && ($col < min) { next }
(max != "") && ($col > max) { next }
{ print }' $*

Download

cstats.sh

Our next script will print character statistics of input data. This shell script will not take any options, but we are still using the same overall design. One of the most interesting parts of the awk script inside cstats.sh is the input field separators which is set to an empty string. That means that every single input character will be counted as a field of the input line.

#!/usr/bin/env bash

usage() {
	echo "usage: ${progname}" >&2
	exit 1
}

progname=$(basename $0)
errs=

while getopts "" opt
do
	case "${opt}" in
	\?)
		errs="yes"
		;;
	esac
done

[ -n "${errs}" ] && usage

shift $(expr ${OPTIND} - 1)

exec gawk 'BEGIN {
	FS = ""
}

{
	for (i = 1; i <= NF; i++) {
		count[$i]++
	}
}

END {
	n = asorti(count, cidx)
	for (i = 1; i <= n; i++) {
		c = cidx[i]
		printf("%s: %d\n", c, count[c])
	}
}' $*

Download

Playing cards with AWK

Our next AWK script creates a 52‑card deck, then shuffles and deals cards for a poker game. We use the symbols C, S, D and H for the clubs, spades, diamonds and hearts suits. We also use the digits 2 through 9 for the ranks of the corresponding cards. We use letters T, J, Q, K and A for the 10s, jacks, queens, kings and aces respectively.

AWK is not an object oriented programming language, but we can simulate objects and methods using good procedural programming techniques. We can represent each card by a two letter string in the form of "SR", where "S" is the suit and "R" is the rank, e.g. "S7" is the Seven of Spades, "HA" is the Ace of Hearts and "DT" is the Ten of Diamonds.

More, we can represent a set of cards as an AWK array. It's good to know how many cards are contained in such a set, so we choose to store the cards' count in the 0‑indexed array element, while cards are stored to the elements indexed from 1, e.g. the cards Seven of Spades, Ace of Hearts, Queen of Clubs and Ten of Diamonds will be indexed as: [0]4 meaning that there are four cards in the set, [1]S7, [2]HA, [3]CQ, [4]DT. Using the same technique we can represent a whole deck with an array of 53 elements, where 0‑indexed element will contain the number of 52.

BEGIN {
	setup()
	create_deck(deck)
	shuffle_cards(deck)
	deal_cards(deck, hands)
	print_hands(hands)
}

function setup() {
	suit_list["C"] = "Clubs"
	suit_list["S"] = "Spades"
	suit_list["D"] = "Diamonds"
	suit_list["H"] = "Hearts"

	rank_list["2"] = "Two"
	rank_list["3"] = "Three"
	rank_list["4"] = "Four"
	rank_list["5"] = "Five"
	rank_list["6"] = "Six"
	rank_list["7"] = "Seven"
	rank_list["8"] = "Eight"
	rank_list["9"] = "Nine"
	rank_list["T"] = "Ten"
	rank_list["J"] = "Jack"
	rank_list["Q"] = "Queen"
	rank_list["K"] = "King"
	rank_list["A"] = "Ace"

	# There exist two functions for printing hands. Function
	# "print_hand_text" prints hands as text, while function
	# "print_hand_html" creates HTML for displaying hands in
	# a web page.

	print_hand = html ? "print_hand_html" : "print_hand_text"

	srand()
}

# Function "create_deck" accepts a cards array as an argument and
# fills it with cards. Cards are indexed from 1 to 52, while index
# 0 contains cards' count (52). Each card is represented as a string
# "SR", where "S" is the suit and "R" is the rank, e.g. "S7" is the
# Seven of Spades, "HT" is the Ten of Hearts and "DQ" is the Queen
# of Diamonds.

function create_deck(deck,		suit, rank) {
	delete deck

	for (suit in suit_list) {
		for (rank in rank_list) {
			deck[0]++
			deck[deck[0]] = suit rank
		}
	}
}

# Function "shuffle_cards" shuffles the deck passed as parameter.
# We can pass the shuffle count as second parameter, which by deafult
# is 10 times the deck's cards count.

function shuffle_cards(deck, n,		i1, i2, t) {
	if (!n)
	n = deck[0] * 10

	while (n--) {
		i1 = int(rand() * deck[0]) + 1
		i2 = int(rand() * deck[0]) + 1

		t = deck[i1]
		deck[i1] = deck[i2]
		deck[i2] = t
	}
}

# Function "deal_cards" deals cards from the (shuffled) deck passed as
# first parameter. The second parameter is an array of hands, where
# each hand is an array of cards indexed from 1 to the hands' count;
# index 0 of each hand contains the hand's cards' count.

function deal_cards(deck, hands, n,		i, j) {
	delete hands

	# By default 5 cards are dealt.

	if (!n)
	n = 5

	for (i = 1; i <= n_players; i++) {
		hands[i][0] = n
		for (j = 1; j <= n; j++) {
			hands[i][j] = pop_card(deck)
		}
	}
}

# Function "pop_card" pops a card from the hand passed as first
# parameter. The whole deck could be considered as "hand".
# The popped card is returned, while the cards' count (index 0) is
# decremented by one. Usually the hand passed is the whole deck.

function pop_card(x,		card) {
	if (!x[0]) {
		print "no more cards" >"/dev/stderr"
		exit(1)
	}

	card = x[x[0]]
	x[0]--
	
	return card
}

# Function "print_hands" prints the hands of all players. There exist
# two print functions, one for printing hands as text and the other
# for printing hands as HTML elements.

function print_hands(hands,			i) {
	for (i = 1; i <= n_players; i++) {
		@print_hand(hands[i])
	}
}

# Function "print_hand_text" prints the passed hand as a string.

function print_hand_text(hand,		i) {
	for (i = 1; i <= hand[0]; i++) {
		printf hand[i]
	}

	print ""
}

# Function "print_hand_html" prints the passed hand as HTML <div> element
# containing the card images.

function print_hand_html(hand,		i) {
	print "<div>"

	for (i = 1; i <= hand[0]; i++) {
		print "<img src=\"../image/cards/" hand[i] ".png\">"
	}

	print "</div>"
}

Download

After specifying the players' count and the type of hands representation, click the [Run] button to run the AWK command and get a new set of random 5‑card hands on‑line.


Players Text HTML