In awk, all arrays are associative arrays. What makes an associative array unique is that its index can be a string or a number.
In most programming languages, the indices of arrays are exclusively numeric. In these implementations, an array is a sequence of locations where values are stored. The indices of the array are derived from the order in which the values are stored. There is no need to keep track of indices. For instance, the index of the first element of an array is "1" or the first location in the array.
An associative array makes an "association" between the indices and the elements of an array. For each element of the array, a pair of values is maintained: the index of the element and the value of the element. The elements are not stored in any particular order as in a conventional array. Thus, even though you can use numeric subscripts in awk, the numbers do not have the same meaning that they do in other programming languages - they do not necessarily refer to sequential locations. However, with numeric indices, you can still access all the elements of an array in sequence, as we did in previous examples. You can create a loop to increment a counter that references the elements of the array in order.
Sometimes, the distinction between numeric and string indices is important. For instance, if you use "04" as the index to an element of the array, you cannot reference that element using "4" as its subscript. You'll see how to handle this problem in a sample program date-month, shown later in this chapter.
Associative arrays are a distinctive feature of awk, and a very powerful one that allows you to use a string as an index to another value. For instance, you could use a word as the index to its definition. If you know the word, you can retrieve the definition.
For example, you could use the first field of the input line as the index to the second field with the following assignment:
array[$1] = $2
Using this technique, we could take our list of acronyms and load it into an array named acro.
acro[$1] = $2
Each element of the array would be the description of an acronym and the subscript used to retrieve the element would be the acronym itself.
......
8.5.1 Multidimensional Arrays
Awk supports linear arrays in which the index to each element of the array is a single subscript. If you imagine a linear array as a row of numbers, a two-dimensional array represents rows and columns of numbers. You might refer to the element in the second column of the third row as "array[3, 2]." Two- and three-dimensional arrays are examples of multidimensional arrays. Awk does not support multidimensional arrays but instead offers a syntax for subscripts that simulate a reference to a multidimensional array. For instance, you could write the following expression:
file_array[NR, i] = $i
where each field of an input record is indexed by its record number and field number. Thus, the following reference:
file_array[2, 4]
would produce the value of the fourth field of the second record.
This syntax does not create a multidimensional array. It is converted into a string that uniquely identifies the element in a linear array. The components of a multidimensional subscript are interpreted as individual strings ("2" and "4," for instance) and concatenated together separated by the value of the system variable SUBSEP. The subscript-component separator is defined as "\034" by default, an unprintable character rarely found in ASCII text. Thus, awk maintains a one-dimensional array and the subscript for our previous example would actually be "2\0344" (the concatenation of "2," the value of SUBSEP, and "4"). The main consequence of this simulation of multidimensional arrays is that the larger the array, the slower it is to access individual elements. However, you should time this, using your own application, with different awk implementations (see Chapter 11, A Flock of awks).