Chinaunix

标题: 谈谈我理解的awk数组 [打印本页]

作者: yy_galois    时间: 2009-09-18 12:52
标题: 谈谈我理解的awk数组
awk数组是awk的重要技巧点之一。

awk使用的是关联数组,这和c语言的顺序数组大不一样,所以很多初学者刚开始使用的时候不太习惯。

awk的数组既然叫做关联数组,它的本质就是映射。
用数组的时候你把它想象成一个映射。

比如
a[ $1 ] = $2
可以把它想象成 f (x) = y;的形式,也就是 f ($1) = $2;

同样的, a[ $1,  $2 ] = $3;
可以想象成  f(x, y) = z;的形式,两个自变量,一个因变量。

用函数的方式来描述数组,这样似乎容易理解一些。
作者: beginner-bj    时间: 2009-09-18 12:57
说得不错啊,谢谢!
作者: blackold    时间: 2009-09-18 13:06
标题: 回复 #1 yy_galois 的帖子
a[x,y]理解成两个自变量。
这样理解不太好吧。

其实,可以简单地把关联数组理解成key/value对。

awk的数组和perl的关联数组(现在称为hash)一样,只是没有后者的功能强大。
作者: yy_galois    时间: 2009-09-18 13:15
标题: 回复 #3 blackold 的帖子
hehe, 只是一种记忆方法。

作者: cxfcxf    时间: 2009-09-18 13:18
干脆用perl的多维数组写嘛..反正都有perl...
作者: blackold    时间: 2009-09-18 13:21
标题: 回复 #4 yy_galois 的帖子
但是,习惯很重要。

慢慢地这种“坏想法”就会不知不觉地影响你,影响你的程序。

就好比你可以当狗称为猪,叫法本身没有什么不可以,但时间长了,问题就出来了。除非你与世隔绝。
作者: richiewu    时间: 2009-09-18 13:34
习惯很重要,从一开始就要正确的理解
作者: lucash    时间: 2009-09-18 15:18
刚看了一段关于awk数组的描述,贴上来:

8.4.1 Associative Arrays

In awk, all arrays are associative arrays. What makes an associative array unique is that its index can be a string or a number.

In most programming languages, the indices of arrays are exclusively numeric. In these implementations, an array is a sequence of locations where values are stored. The indices of the array are derived from the order in which the values are stored. There is no need to keep track of indices. For instance, the index of the first element of an array is "1" or the first location in the array.

An associative array makes an "association" between the indices and the elements of an array. For each element of the array, a pair of values is maintained: the index of the element and the value of the element. The elements are not stored in any particular order as in a conventional array. Thus, even though you can use numeric subscripts in awk, the numbers do not have the same meaning that they do in other programming languages - they do not necessarily refer to sequential locations. However, with numeric indices, you can still access all the elements of an array in sequence, as we did in previous examples. You can create a loop to increment a counter that references the elements of the array in order.

Sometimes, the distinction between numeric and string indices is important. For instance, if you use "04" as the index to an element of the array, you cannot reference that element using "4" as its subscript. You'll see how to handle this problem in a sample program date-month, shown later in this chapter.

Associative arrays are a distinctive feature of awk, and a very powerful one that allows you to use a string as an index to another value. For instance, you could use a word as the index to its definition. If you know the word, you can retrieve the definition.

For example, you could use the first field of the input line as the index to the second field with the following assignment:

    array[$1] = $2

Using this technique, we could take our list of acronyms and load it into an array named acro.

    acro[$1] = $2

Each element of the array would be the description of an acronym and the subscript used to retrieve the element would be the acronym itself.

  ......


8.5.1 Multidimensional Arrays

Awk supports linear arrays in which the index to each element of the array is a single subscript. If you imagine a linear array as a row of numbers, a two-dimensional array represents rows and columns of numbers. You might refer to the element in the second column of the third row as "array[3, 2]." Two- and three-dimensional arrays are examples of multidimensional arrays. Awk does not support multidimensional arrays but instead offers a syntax for subscripts that simulate a reference to a multidimensional array. For instance, you could write the following expression:

    file_array[NR, i] = $i

where each field of an input record is indexed by its record number and field number. Thus, the following reference:

    file_array[2, 4]

would produce the value of the fourth field of the second record.

This syntax does not create a multidimensional array. It is converted into a string that uniquely identifies the element in a linear array. The components of a multidimensional subscript are interpreted as individual strings ("2" and "4," for instance) and concatenated together separated by the value of the system variable SUBSEP. The subscript-component separator is defined as "\034" by default, an unprintable character rarely found in ASCII text. Thus, awk maintains a one-dimensional array and the subscript for our previous example would actually be "2\0344" (the concatenation of "2," the value of SUBSEP, and "4"). The main consequence of this simulation of multidimensional arrays is that the larger the array, the slower it is to access individual elements. However, you should time this, using your own application, with different awk implementations (see Chapter 11, A Flock of awks).

        .......




欢迎光临 Chinaunix (http://bbs.chinaunix.net/) Powered by Discuz! X3.2