- 论坛徽章:
- 2
|
本帖最后由 yinyuemi 于 2016-07-26 10:23 编辑
好久没发帖了,今天来策一策awk的数组升级版,参考自https://www.gnu.org/software/gawk/manual/gawk.html#Arrays
awk数组的基本用法,这里就不再赘述了(3.0+版本数组的主要用法在这里http://bbs.chinaunix.net/thread-2312439-1-1.html),这里主要讲是gawk4.0+版本中关于数组的2种新用法,所以还没有升级版本的筒子们赶快动起来吧。
1. 预定义遍历的数组
一般情况下,用for item in array的方法输出数组的值,其顺序是没有定义的,也就是”乱序的“。但是很多时候,我们希望
数组的值按照一定的要求输出,比如按照数值大小升序或降序的方式等等。此时,一般的做法,是通过asort或asorti来间接实现。
不过,现在好了,gawk4.0+版本提供了非常方便的对数组输出的控制模式。
这里涉及到一个gawk的一个内置数组PROCINFO,大家可以运行这个查看它的详细信息:- awk 'BEGIN{for(i in PROCINFO){if(isarray(PROCINFO[i])){for( j in PROCINFO[i])print i,j,PROCINFO[i][j]}else{print i,PROCINFO[i]}}}'
复制代码 其中控制数组遍历模式的是"sorted_in",如下面的列表:
ROCINFO ["sorted_in"] | Description | @unsorted | Array indexes are processed in arbitrary order (default awk behavior). | @ind_str_asc | The array is sorted with indexes compared as strings in ascending order. | @ind_num_asc | The array is sorted with indexes compared as numbers in ascending order. Non-numeric indexes are treated as zero. | @val_type_asc | The array is sorted based on values as per its type in ascending order. All numbers come before the strings. The sub-arrays come after the strings. | @val_str_asc | The array is sorted based on values of elements, treating the values as strings, in ascending order. | @val_num_asc | The array is sorted based on values of elements, treating values as numbers, in ascending order. | @ind_str_desc | The array is sorted based on index, treated as strings, in descending order. | @ind_num_desc | The array is sorted based on index, treated as numbers, in descending order. | @val_type_desc | The array is sorted based on the value of the element as per its type in descending order. Subarrays come first, then the strings and lastly, the numbers. | @val_str_desc | The array is sorted based on element values, treated as strings, in descending order. | @val_num_desc | The array is sorted based on values, treated as numbers, in descending order. |
一言不合举栗子: - # 默认方式,即无序
- awk '
- BEGIN {PROCINFO ["sorted_in"] = "@unsorted"
- fruit ["apple"] = 4
- fruit ["mango"] = 12
- fruit ["guava"] = 8
- fruit ["banana"] = 16
- for (j in fruit)
- printf ("%s: %d numbers\n", j, fruit [j])
- } '
- guava: 8 numbers
- mango: 12 numbers
- apple: 4 numbers
- banana: 16 numbers
复制代码- # 按照value的大小升序
- awk '
- BEGIN {PROCINFO ["sorted_in"] = "@val_num_asc"
- fruit ["apple"] = 4
- fruit ["mango"] = 12
- fruit ["guava"] = 8
- fruit ["banana"] = 16
- for (j in fruit)
- printf ("%s: %d numbers\n", j, fruit [j])
- } '
- apple: 4 numbers
- guava: 8 numbers
- mango: 12 numbers
- banana: 16 numbers
复制代码- # 按照index字母顺序降序
- awk '
- BEGIN {PROCINFO ["sorted_in"] = "@ind_str_desc"
- fruit ["apple"] = 4
- fruit ["mango"] = 12
- fruit ["guava"] = 8
- fruit ["banana"] = 16
- for (j in fruit)
- printf ("%s: %d numbers\n", j, fruit [j])
- } '
- mango: 12 numbers
- guava: 8 numbers
- banana: 16 numbers
- apple: 4 numbers
复制代码 俗话说,”栗子不过三“,就举到这里先。
是不是觉得asort/asorti在这个sorted_in”控制阀“面前弱爆了?!
友情提示: 因为PROCINFO ["sorted_in"]是全局性的变量,一旦设定之后,会改变整个awk的数组遍历方式,所以如果你希望在小范围内使用,可以按照下面的方式来做。
- …
- if ("sorted_in" in PROCINFO) {
- save_sorted = PROCINFO["sorted_in"]
- PROCINFO["sorted_in"] = "@val_str_desc" # or whatever
- }
- …
- if (save_sorted)
- PROCINFO["sorted_in"] = save_sorted
复制代码 事实上,除了awk内置的遍历函数,sorted_in也可以被赋予自定义的函数。
自定义的函数有个通用的代码框架如下:
- function comp_func(i1, v1, i2, v2) # 至少包含4个参数
- {
- compare elements 1 and 2 in some fashion
- return < 0; 0; or > 0
- }
复制代码 栗子如下:
- awk '
- BEGIN{
- arr[1] = 10
- arr[2] = 2
- arr[3] = 100
- arr["one"] = 10
- arr["two"] = 1
- arr["three"] = 100
- PROCINFO["sorted_in"] = "cmp_num_val_desc"
- print "#exactly the same as @val_num_desc"
- for(i in arr)
- print "arr["i"] = " arr[i]
- print "如果排序规则改为:1. index:字母在前,数字之后 2. index一致时, value降序"
- PROCINFO["sorted_in"] = "cmp_smart_desc"
- print "#sort in a smarter way"
- for(i in arr)
- print "arr["i"] = " arr[i]
- }
- function cmp_num_val_desc(i1, v1, i2, v2)
- {
- # numerical value comparison, descending order,
- return (v2 - v1)
- }
- function cmp_smart_desc(i1, v1, i2, v2, n1, n2)
- {
- # numbers after string value comparison, descending order
- n1 = i1 + 0
- n2 = i2 + 0
- if (n1 != i1)
- return (n2 != i2) ? (v2 - v1) : -1
- else if (n2 != i2)
- return 1
- return v2 - v1
- }
- '
- #exactly the same as @val_num_desc
- arr[three] = 100
- arr[3] = 100
- arr[one] = 10
- arr[1] = 10
- arr[2] = 2
- arr[two] = 1
- 如果排序规则改为:1. index:字母在前,数字之后 2. index一致时, value降序
- #sort in a smarter way
- arr[three] = 100
- arr[one] = 10
- arr[two] = 1
- arr[3] = 100
- arr[1] = 10
- arr[2] = 2
复制代码 2. 数组的数组 (Arrays of Arrays)
有了它,awk就可以真正创建多维数组,而不像以前版本那样用一维数组来模拟多维。
如果有童鞋对perl的hash熟悉的话,那么它可以理解为hash of hash。
下面先看“数组的数组”活生生的样子吧
- a[1][1]=1
- a[1][2]=2
- a[1][3]=3
复制代码 是不是很眼熟,在某种/些语言里有相同的写法。
没错,这就是一个典型的二维数组,第一维的index为[1],第二维为[1][2][3]。
事实上,为了保持每一维度在index使用的灵活性,对于下面的写法也是继续支持的:
- a[1][1,"a"]=1
- a[1][2,"a"]=2
- a[1][3,"a"]=3
复制代码 并且,每一维数组的value可以是一个scalar,也可以是一个subarray
- a[1][1,"a"]=1
- a[2]=2
- a[3][3][4]=3
复制代码 好了,说了这么多,如何打印Arrays of Arrays呢?其实很简单~
- for (i in array)
- for (j in array[i])
- print array[i][j]
复制代码 当你不知道某个维度的value是scalar,还是subarray,那么可以加个判断。
如何判断呢?也很简单,因为新版gawk已经帮你写好函数,就等你用了,它就是isarray。
官方文档还配备了一个残暴的walk_array, 简直是无所不至。
- function walk_array(arr, name, i)
- {
- for (i in arr) {
- if (isarray(arr[i]))
- walk_array(arr[i], (name "[" i "]"))
- else
- printf("%s[%s] = %s\n", name, i, arr[i])
- }
- }
复制代码 好久没有写文档了,一口气写了这么多,感觉身体快被掏空 不多说了,再打一套以上两个新功能的“组合拳”就结贴了!
模拟sort排序
- cat file
- abc 123 100
- abc 456 100
- abc 456 10
- def 123 10
- def 123 100
- abc 123 1
- xzy 789 0
- # sort 排序: 第一列按照字母升序,第二列数字升序,第三列数字降序
- sort -k1,1 -k2,2n -k3,3nr file
- abc 123 100
- abc 123 1
- abc 456 100
- abc 456 10
- def 123 100
- def 123 10
- xzy 789 0
- # awk 3.0+ 排序
- awk '
- {
- a[$1" "$2" "$3];
- b[$1]=$1;
- c[$2];
- d[$3]
- }
- END{
- for(i=1;i<=asort(b);i++)
- for(j=1;j<=asorti(c,e);j++)
- for(k=asorti(d,f);k>=1;k--)
- if(b[i]" "e[j]" "f[k] in a)
- print b[i],e[j],f[k]
- }
- ' file
- abc 123 100
- abc 123 1
- abc 456 100
- abc 456 10
- def 123 100
- def 123 10
- xzy 789 0
- # gawk 4.0+ 排序
- awk '
- {
- arr[$1][$2][$3]
- }
- END{
- PROCINFO["sorted_in"] = "@ind_str_asc"
- for(i in arr){
- PROCINFO["sorted_in"] = "@ind_num_asc"
- for(j in arr[i]){
- PROCINFO["sorted_in"] = "@ind_num_desc"
- for(k in arr[i][j])
- print i,j,k
- }
- }
- }
- ' file
- abc 123 100
- abc 123 1
- abc 456 100
- abc 456 10
- def 123 100
- def 123 10
- xzy 789 0
复制代码 两种awk的写法相比,gawk的是不是更加清晰,明了呢
艾玛呀,终于写完了,希望能给大家一些启示和帮助,抛砖引玉,如有错误的地方,请不吝指正!
最后还想说的是gawk4.0版本还有很多fancy的功能,有兴趣的可以翻翻 http://bbs.chinaunix.net/thread-3559813-1-1.html
|
评分
-
查看全部评分
|