Chinaunix

标题: awk初学之常见问题 [打印本页]

作者: yinyuemi 时间: 2011-04-26 07:32
标题: awk初学之常见问题
本帖最后由 yinyuemi 于 2016-12-08 16:14 编辑

从初学awk到现在小有所成，非常感谢CUers的帮助，总结了下自己曾经遇到的问题和犯的错误，供初学者借鉴，因本人非计算机专业，对专业词汇可能有表述不对的地方，还请指正和补充！

1. awk ‘{code}1’ 中的“1”是干什么的？

一个完整的awk语句为：Awk ‘[patten]{action}……’, 其中pattern缺省为1，action缺省为{print}。

那么awk ‘1’完整的写法就是awk ‘1{print}’; 同理，awk ‘{print}’完整的写法也是awk ‘1{print}’。

2. NR和FNR的区别是啥？

NR: 当前行记录数。

FNR: 当前文件的行记录数。

当awk处理的文件数超过1时，NR和FNR才会有区别。例如：

cat file
a
b
c
d
e
f
awk '{print "NR = " NR " FNR = " FNR, $0}' file
NR = 1 FNR = 1 a
NR = 2 FNR = 2 b
NR = 3 FNR = 3 c
NR = 4 FNR = 4 d
NR = 5 FNR = 5 e
NR = 6 FNR = 6 f
awk '{print "NR = " NR " FNR = " FNR, $0}' file file
NR = 1 FNR = 1 a
NR = 2 FNR = 2 b
NR = 3 FNR = 3 c
NR = 4 FNR = 4 d
NR = 5 FNR = 5 e
NR = 6 FNR = 6 f
NR = 7 FNR = 1 a
NR = 8 FNR = 2 b
NR = 9 FNR = 3 c
NR = 10 FNR = 4 d
NR = 11 FNR = 5 e
NR = 12 FNR = 6 f

复制代码

3. Awk怎么引入变量？

有两种方法：

<1>: awk -v var=$VAR '{code}'
<2>: awk '{CODE}'$VAR'{CODE}'
例如：

VAR=XXX
awk -v var=$VAR 'BEGIN{print var}'
XXX
awk 'BEGIN{print "'$VAR'"}'
XXX

复制代码

我推荐使用第一种方法，这样可以避免一些不必要的烦恼。如http://bbs.chinaunix.net/thread-1835620-1-1.html

4. 为什么OFS不起作用？

先看一个例子：

echo 'aaa bbb ccc ddd
aaa bbb ccc ddd
aaa bbb ccc ddd
aaa bbb ccc ddd' |awk -v OFS="|" '{print $0}'
aaa bbb ccc ddd
aaa bbb ccc ddd
aaa bbb ccc ddd
aaa bbb ccc ddd

复制代码

上面的例子中OFS为什么没有生效呢，原因是OFS指的是输出字段分隔符，所以必须对字段进行操作时OFS才会起作用，正确的方法应该是：

echo 'aaa bbb ccc ddd
aaa bbb ccc ddd
aaa bbb ccc ddd
aaa bbb ccc ddd' |awk -v OFS="|" '{$1=$1;print $0}'
aaa|bbb|ccc|ddd
aaa|bbb|ccc|ddd
aaa|bbb|ccc|ddd
aaa|bbb|ccc|ddd

复制代码

正如Tim大师所讲的，$1=$1这个action，是我们对awk撒的谎，目的就是为了使得OFS生效，除此之外，NF+=0也是常用的方法。参考：http://bbs.chinaunix.net/viewthread.php?tid=1354674&extra=&page=1

5. 同样的代码，别人运行成功，为什么我运行失败？

这个问题的原因很多，我这里列举两个最常见的，大家可以补充。

<1>: awk版本引起的，如gawk中的一些扩展函数或变量，在nawk中没有，或是不同版本的(g/n)awk也会有差别,这样情况需要重新编写。

<2>: 文本格式的问题，cat-A file查看一下，如果是，dos2unix应该可以解决。

注：书写错误也有可能哦

6. Awk 语句中可以使用{n,m}这样的正则么？

可以，使用方法：gawk -- re-interval ，其它版本使用方法会有所不同，请大家补充

7. BEGIN 和END 到底是怎么一回事？

有时，对于新手可能也会是个问题。简单说下：

BEGIN {action} : 读取文本之前进行的操作。要避免类似下面的写法：

awk 'BEGIN{ filename = FILENAME}' file
# or：
awk 'BEGIN{FS=":"; for(i=2;i<=NF;i++) print $i}' file

复制代码

如果BEGIN 模块中使用getline函数时，情况会有所不同：

cat file
1
2
3
4
5
awk 'BEGIN{while (getline <"file") print}' file
1
2
3
4
5

复制代码

END {action}:
它在整个输入文件处理完成后被执行，同样无法对文本进行任何操作，如匹配某个pattern执行action。

8. print，printf 和sprintf？

print：为一般的打印

printf：可以定义打印格式

sprintf：可以完成和printf相同的功能，不同的是sprintf只能输出值，并不能完成打印的功能。

12楼-expert1补充：：) print默认有个换行\n,
而printf没有，当然它和C语言的printf类似（awk本是c的近亲）,能打印各种格式，但默认没有换行。

例如：

awk 'BEGIN{var=123; print "var = " var}'
var = 123
awk 'BEGIN{var=123;printf "%s %5f\n", "var =",var}'
var = 123.000000
awk 'BEGIN{var=123;sprintf ("%s %5f\n", "var =",var)}'
awk ‘BEGIN{var1=123;var2=sprintf ("%5f",var1); print "var2 =" var2}’
var2 = 123.000000

复制代码

9. “a==b?c:d” ？

这个是一个if语句的简写，即conditional expression1 ? expression2: expression3；完整写法为：

if(a==b) {c} else {d}

10. awk ‘! a[$0]++’ 怎么理解？

这是一个非常经典的去重复项的awk语句，虽然短小，不过涉及到了不少知识点，下面一一解读：

<1> ：”!” 即非。

<2>：a[$0]，以$0为数据下标，建立数组a

<3>：a[$0]++，即给数组a赋值，a[$0]+=1

<4> ：那么组合起来，awk是怎么执行!a[$0]++的呢？我用一个实际例子来解释：

cat file
111
222
111
222
333
awk '{print a[$0],!a[$0]++,a[$0],!a[$0],$0}' file
1 1 0 111
1 1 0 222
1 0 2 0 111
1 0 2 0 222
1 1 0 333

复制代码

https://www.gnu.org/software/gaw ... .html#Increment-Ops
lvalue++

Increment lvalue, returning the old value of lvalue as the value of the expression.

awk ‘++a[$0]==1’ 和上面的代码作用一样，你理解了么?

11. 如何打印单双引号？

awk 'BEGIN {print "single quote --> '\''";print "double quote --> \"" }'
single quote --> '
double quote --> "

复制代码

更可靠的的方法如Tim所示：

awk 'BEGIN {print "single quote --> \047";print "double quote --> \042" }'

复制代码

12. awk 语句中多个{}是怎么执行的？

还是用个例子来说明：

cat file
1
2
3
4
5
awk '$1==3{printf "|| "$0}{printf " @@ "$0}{print $0}' file # 这个语句中包含三个action
@@ 11 # 判断$1==3?否；执行 action {printf " @@ "$0}；执行 action {print $0}
@@ 22 # 判断$1==3?否；执行 action {printf " @@ "$0}；执行 action {print $0}
|| 3 @@ 33 # 判断$1==3?是，执行{print “|| “$0}; 执行 action {printf " @@ "$0}；执行 action {print $0}
@@ 44 # 判断$1==3?否；执行 action {printf " @@ "$0}；执行 action {print $0}
@@ 55 # 判断$1==3?否；执行 action {printf " @@ "$0}；执行 action {print $0}

复制代码

这样可以清楚的看出，awk是一行一行读取文本，然后按照代码的前后顺序执行。但如果action中包含next或exit时，有所不同：

awk '$1==3{printf "|| "$0;next}{printf "@@ "$0}{print $0}' file
@@ 11
@@ 22
|| 3@@ 44
@@ 55
awk '$1==3{printf "|| "$0;exit}{printf "@@ "$0}{print $0}' file
@@ 11
@@ 22
|| 3

复制代码

13. FS, OFS, RS, ORS?

最后用图解的方式说明一下这四个变量：

Picture2.png (38.81 KB, 下载次数: 654)

作者: yinyuemi 时间: 2011-04-26 08:41
多谢Tim哥支持

作者: ly5066113 时间: 2011-04-26 08:44
有几个地方感觉不是很准确。

6. Awk 语句中可以使用{n,m}这样的正则么？
可以，不过是一种GNU的扩展，使用方法：gawk -- re-interval

{n,m}这样的正则在什么版本的awk里都可以使用，并不是GNU的扩展，只不过使用的方式有所不同。

7. BEGIN 和END 到底是怎么一回事？
有时，对于新手可能也会是个问题。简单说下：
BEGIN {action} : 读取文本之前进行的操作，所以不可以妄想在BEGIN的{action}中，对文本进行任何操作，或获取任何文本的信息。

这个说法太绝对了，在BEGIN中可以使用getline函数来操作文本。

11. 如何打印单双引号？

这个我更倾向与用八进制数来表示，利用shell引号的解析来做看起来很混乱，就像变量的引用一样

awk 'BEGIN {print "single quote --> \047";print "double quote --> \042" }'

复制代码

作者: ywlscpl 时间: 2011-04-26 08:45
支持

作者: yinyuemi 时间: 2011-04-26 08:46

有几个地方感觉不是很准确。

{n,m}这样的正则在什么版本的awk里都可以使用，并不是GNU的扩展，只不过使 ...
ly5066113 发表于 2011-04-26 08:44

好的，马上改正

作者: unixlinuxsys 时间: 2011-04-26 09:03
好筒子

作者: blackold 时间: 2011-04-26 09:09
这种分享的精神绝对要支持！

有些地方可能值得商榷，如:

10.

原来，第一个a[$0]的值为空，由于“!”的运算级别高于“+”

++与+是不同的操作符，还有++的优先级比!高。

13. 里面的图示是不是弄反了?

作者: yinyuemi 时间: 2011-04-26 09:26
回复 7# blackold

图已更正，黑哥真是火眼

++比！级别高的话，!a[$0]++是先执行a[$0]++么？那!a[$0]++不应该是0么？有点晕，黑哥指点下

作者: blackold 时间: 2011-04-26 09:38
回复 8# yinyuemi

这是前加/后加的区别，前几天还说过。

后加: 先使用变量的值，再自加。

!a[$0]++ 对这个表达式的求值，它的值与 !a[$0] 相同(先使用变量a[$0]的值)，但对表达式求值后 a[$0]会自加。

图里面的 RS值前后不一致啊。

作者: yinyuemi 时间: 2011-04-26 09:47

回复 yinyuemi

这是前加/后加的区别，前几天还说过。

后加: 先使用变量的值，再自加。
...
blackold 发表于 2011-04-26 09:38

我知道问题出哪里了，我之前“困”在那个运算级别的判断了，现在弄明白了，多谢黑哥

作者: zooyo 时间: 2011-04-26 10:02
提示: 作者被禁止或删除内容自动屏蔽

作者: expert1 时间: 2011-04-26 10:10
不错不错总结的挺好。
print默认有个换行\n,
而printf没有，当然它和C语言的printf类似（awk本是c的近亲）,能打印各种格式，但默认没有换行。

作者: zhaoke0128 时间: 2011-04-26 10:16
回复 1# yinyuemi

我也是非计算机专业的，向LZ学习

作者: zooyo 时间: 2011-04-26 10:20
提示: 作者被禁止或删除内容自动屏蔽

作者: xiaopan3322 时间: 2011-04-26 10:30
我发现自己特爱干这活：

yinyuemi--awk初学之常见问题.pdf (638.51 KB, 下载次数: 326)

作者: ziyunfei 时间: 2011-04-26 11:02

作者: bjyuxiao 时间: 2011-04-26 11:30
各种感谢

作者: yinyuemi 时间: 2011-04-26 12:58
非常感谢大伙儿的支持！

作者: xrzs1986 时间: 2011-04-26 23:35
这个绝对要支持下，善于总结确实是个好习惯，学习啦~

作者: xrzs1986 时间: 2011-04-26 23:44
回复 15# xiaopan3322

作者: davidbeckham921 时间: 2011-04-27 13:37
要火啊，赶紧顶！~~

作者: laohuanggua 时间: 2011-04-28 23:43
几位老大都冒泡了啊。是不是把处理多个文件的法子也写写。

作者: yinyuemi 时间: 2011-04-29 04:14
本帖最后由 yinyuemi 于 2011-04-29 04:21 编辑

回复 22# laohuanggua

多个文件处理，多半需要数组来解决，关于数组方面的，我也正在整理和总结当中，过几天会贴出来

作者: laohuanggua 时间: 2011-04-29 09:41
回复 23# yinyuemi

首先是ARGV参数数组。

作者: qinyudd 时间: 2011-04-29 13:57

各种大牛出没

作者: lvyuancyx 时间: 2011-05-08 21:46

有几个地方感觉不是很准确。

{n,m}这样的正则在什么版本的awk里都可以使用，并不是GNU的扩展，只不过使 ...
ly5066113 发表于 2011-04-26 08:44

打印单引号的时候虽然可以用ascii来表示
\047 \042 但是在某些时候特别是单引号后面跟数字的时候,就很麻烦.'2010-10-07' 这时候\0472010-10-07就会出问题!

作者: shplpy 时间: 2011-05-08 23:03
不错，支持！

作者: yinyuemi 时间: 2011-05-09 01:16

打印单引号的时候虽然可以用ascii来表示
\047 \042 但是在某些时候特别是单引号后面跟数字的时候, ...
lvyuancyx 发表于 2011-05-08 21:46

把它们隔开写，就没问题了

$ awk 'BEGIN{print "\047" "2010-10-07" "\047"}'
'2010-10-07'

复制代码

作者: blackold 时间: 2011-05-09 09:55
回复 26# lvyuancyx

$ echo foo|awk '{print "\0472010-10-07\047"}'
'2010-10-07'

作者: ly5066113 时间: 2011-05-09 10:55
也可以这样，可能更清晰一些：

awk 'BEGIN{q="\047";print q"2010-10-07"q}'

复制代码

作者: lvyuancyx 时间: 2011-05-09 18:46

谢谢TIM,黑哥,yinyuemi.

作者: tujinguniq 时间: 2011-05-13 17:45
谢谢分享！！！好帖子

作者: benteke 时间: 2011-07-11 13:55
谢谢，有收获

作者: 1234zhuxu 时间: 2011-07-13 17:49
对初学者真长知识！！

作者: tt_yy123 时间: 2011-11-07 09:40
回复 1# yinyuemi

awk '$1==3{printf "|| "$0}{printf " @@ "$0}{print $0}' file

请问“||” “@@” 是什么意思啊，没怎么看懂这里！

作者: Shell_HAT 时间: 2011-11-07 09:59
本帖最后由 Shell_HAT 于 2015-07-06 11:52 编辑

回复 35# tt_yy123

就是两个普通的字符串，没有什么特殊含义，你高兴的话可以换成别的。
目的是让你看清楚到底是哪个{}里面的内容在执行。

关于awk的多文件处理：

awk的数据输入有两个来源，标准输入和文件，后一种方式支持多个文件，如
1、shell的Pathname Expansion方式：awk '{...}'  *.txt    #  *.txt先被shell解释，替换成当前目录下的所有*.txt，如当前目录有1.txt和2.txt，则命令最终为awk '{...}' 1.txt 2.txt
2、直接指定多个文件： awk '{...}' a.txt b.txt c.txt ...
awk对多文件的处理流程是，依次读取各个文件内容，如上例，先读a.txt，再读b.txt....

那么，在多文件处理的时候，如何判断awk目前读的是哪个文件，而依次做对应的操作呢？
1、当awk读取的文件只有两个的时候，比较常用的有两种方法
一种是awk 'NR==FNR{...}NR>FNR{...}'  file1 file2 或awk 'NR==FNR{...}NR!=FNR{...}' file1 file2
另一种是 awk 'NR==FNR{...;next}{...}' file1 file2
了解了FNR和NR这两个awk内置变量的意义就很容易知道这两种方法是如何运作的

FNR       The input record number in the current input file.    #已读入当前文件的记录数
NR       The total number of input records seen so far.          #已读入的总记录数
next Stop processing the current input record.  The next input record  is
      read  and  processing  starts over with the first pattern in the AWK
      program.  If the end of the input data is reached, the END block(s),
      if any, are executed.

对于awk 'NR==FNR{...}NR>FNR{...}'  file1 file2
读入file1的时候，已读入file1的记录数FNR一定等于awk已读入的总记录数NR，因为file1是awk读入的首个文件，故读入file1时执行前一个命令块{...}
读入file2的时候，已读入的总记录数NR一定>读入file2的记录数FNR，故读入file2时执行后一个命令块{...}

对于awk 'NR==FNR{...;next}{...}' file1 file2
读入file1时，满足NR==FNR，先执行前一个命令块，但因为其中有next命令，故后一个命令块{...}是不会执行的
读入file2时，不满足NR==FNR，前一个命令块{..}不会执行，只执行后一个命令块{...}

2、当awk处理的文件超过两个时，显然上面那种方法就不适用了。因为读第3个文件或以上时，也满足NR>FNR (NR!=FNR)，显然无法区分开来。
所以就要用到更通用的方法了：
1、ARGIND 当前被处理参数标志: awk 'ARGIND==1{...}ARGIND==2{...}ARGIND==3{...}... ' file1 file2 file3 ...
2、ARGV 命令行参数数组: awk 'FILENAME==ARGV[1]{...}FILENAME==ARGV[2]{...}FILENAME==ARGV[3]{...}...' file1 file2 file3 ...
3、把文件名直接加入判断: awk 'FILENAME=="file1"{...}FILENAME=="file2"{...}FILENAME=="file3"{...}...' file1 file2 file3 ...
#没有前两种通用

作者: tt_yy123 时间: 2011-11-07 11:13
回复 36# Shell_HAT
cat aa
1
2
3
4
5

awk '$1==3{printf "|| "$0}{printf " @@ "$0}{print $0}' aa
1@@ 1
2@@ 2
3@@ 3
4@@ 4
5@@ 5

没看懂啊。怎么执行的啊。。
不是$1等于3的时候才执行后面的命令么？
awk '$1==3{printf "|| "$0}' aa
awk '$1==3{printf "|| "$0}{printf " @@ "$0}' aa
为什么这样执行什么也没有。。。

作者: Shell_HAT 时间: 2011-11-07 11:24
回复 37# tt_yy123

dos2unix aa | awk '$1==3{printf "|| "$0}'

复制代码

作者: tt_yy123 时间: 2011-11-07 11:32
回复 38# Shell_HAT

dos2unix aa | awk '$1==3{printf "|| "$0}'

--->谢谢这位仁兄！！！

作者: pu57 时间: 2011-11-07 13:53
小伙子学习尽头十足，忘继续保持,加油

作者: yuloveban 时间: 2011-11-07 22:22
回复 1# yinyuemi

非常感谢共享！

作者: yuloveban 时间: 2011-11-07 22:37
回复 1# yinyuemi

楼主具体解析下 awk -v RS="#" '{$1=$1;print $0}' $1=$1应该如何理解啊？刚刚看了你给发的那个帖子，没有看明白！

作者: yinyuemi 时间: 2011-11-08 00:52
本帖最后由 yinyuemi 于 2011-11-08 00:54 编辑

回复 yinyuemi

楼主具体解析下 awk -v RS="#" '{$1=$1;print $0}' $1=$1应该如何理解啊？刚刚 ...
yuloveban 发表于 2011-11-07 22:37

http://www.gnu.org/s/gawk/manual/gawk.html#Fields
Advanced Notes: Understanding $0

It is important to remember that $0 is the full record, exactly as it was read from the input. This includes any leading or trailing whitespace, and the exact whitespace (or other characters) that separate the fields.

It is a not-uncommon error to try to change the field separators in a record simply by setting FS and OFS, and then expecting a plain ‘print’ or ‘print $0’ to print the modified record.

But this does not work, since nothing was done to change the record itself. Instead, you must force the record to be rebuilt, typically with a statement such as ‘$1 = $1’, as described earlier.

$1=$1可以使OFS生效
echo '1#1#1
2#2#2' |awk -vRS="#" '{$1=$1;print $0}'
1
1
1 2 # OFS默认值为空格，生效！或者说，任意一个对域进行操作的action，都会使得OFS生效，比如$2=$2,NF+=0;
2
2
再举两个例子：
echo '1 1 1
2 2 2' |awk 'NR==1{OFS=":";$1=$1;print}NR==2{OFS="#";$1=$1;print}'
1:1:1
2#2#2
echo '1 1 1
2 2 2' |awk 'NR==1{OFS=":";$0=$0;print}NR==2{OFS="#";$0=$0;print}' # $0=$0的指令并不会引起OFS生效。
1 1 1
2 2 2

复制代码

作者: yuloveban 时间: 2011-11-08 09:28
回复 43# yinyuemi

3qs

作者: rayzhang11 时间: 2011-11-08 13:08
总结的真好，感谢

作者: xiaopan3322 时间: 2011-11-17 20:12
关于第四点，今天看manual的时候找到了，贴下：

Finally, there are times when it is convenient to force awk to rebuild the entire record, using the current value of the fields and OFS. To do this, use the seemingly innocuous assignment:

$1 = $1 # force record to be reconstituted
print $0 # or whatever else with $0

This forces awk rebuild the record. It does help to add a comment, as we've shown here.

There is a flip side to the relationship between $0 and the fields. Any assignment to $0 causes the record to be reparsed into fields using the current value of FS. This also applies to any built-in function that updates $0, such as sub() and gsub() (see String Functions).

Advanced Notes: Understanding $0
It is important to remember that $0 is the full record, exactly as it was read from the input. This includes any leading or trailing whitespace, and the exact whitespace (or other characters) that separate the fields.

It is a not-uncommon error to try to change the field separators in a record simply by setting FS and OFS, and then expecting a plain ‘print’ or ‘print $0’ to print the modified record.

But this does not work, since nothing was done to change the record itself. Instead, you must force the record to be rebuilt, typically with a statement such as ‘$1 = $1’, as described earlier.

作者: yinyuemi 时间: 2011-11-17 23:57
回复 46# xiaopan3322

多谢Bob

作者: manganese_zh 时间: 2011-11-18 09:03
不错，给力的贴子

作者: freeterman 时间: 2012-05-15 14:08
10. awk ‘! a[$0]++’ 怎么理解？

这是一个非常经典的去重复项的awk语句，虽然短小，不过涉及到了不少知识点，下面一一解读：

<1> ：”!” 即非。

<2>：a[$0]，以$0为数据下标，建立数组a

<3>：a[$0]++，即给数组a赋值，a[$0]+=1

<4> ：那么组合起来，awk是怎么执行!a[$0]++的呢？我用一个实际例子来解释：

01.cat file

02.111

03.222

04.111

05.222

06.333

07.

08.awk '{print a[$0],!a[$0]++,a[$0],!a[$0],$0}' file

09.  1 1 0 111

10.  1 1 0 222

11.1 0 2 0 111

12.1 0 2 0 222

13.  1 1 0 333
复制代码

原来，第一个a[$0]的值为空，所以!a[$0]++是先作判断，结果为1（非空为真，即为1），再作数组赋值a[$0]++。这也就是为什么前面的!a[$0]++并不一定等于后面的!a[$0]。

awk ‘++a[$0]==1’ 和上面的代码作用一样，你理解了么?

不懂

作者: ulovko 时间: 2012-05-29 15:10
总结的真好，感谢分享！

作者: xiangbei1573 时间: 2012-05-29 15:32
正在学习，看到楼主这种方法，觉得很不错

作者: vincent_lnfsddy 时间: 2012-06-11 13:52
多谢楼主分享！

作者: unandy 时间: 2012-06-13 18:34
12># echo -e "1\n2\n3\n4\n5" | awk '$1==3{print "|| "$0}{printf " @@ "$0}{print $0}'
@@ 11
@@ 22
|| 3
@@ 33
@@ 44
@@ 55
～～～～～～～～～～～～～
16># echo '1
2
3
4
5' | awk '$1==3{print "|| "$0}{printf " @@ "$0}{print $0}'
@@ 11
@@ 22
|| 3
@@ 33
@@ 44
@@ 55

why？

作者: yinyuemi 时间: 2012-06-13 23:36
回复 53# unandy

awk '$1==3{printf "|| "$0}{printf " @@ "$0}{print $0}'

作者: yinyuemi 时间: 2012-06-13 23:37
本帖最后由 yinyuemi 于 2012-06-14 09:02 编辑

回复 53# unandy

---

作者: unandy 时间: 2012-06-14 10:56
回复 54# yinyuemi
收到！谢谢！

作者: davidbeckham921 时间: 2012-06-28 20:02
学习学习！好帖！

作者: ulovko 时间: 2012-07-10 07:24
阁下的文章总结的真好！^_^

作者: guoweiqust 时间: 2012-07-12 13:57
回复 1# yinyuemi

yy哥哪个专业的

作者: guoweiqust 时间: 2012-07-13 16:05
回复 1# yinyuemi

awk '{split("123-456-789",parts_array,"-");print parts_array[1]"\n"parts_array[2]}'

复制代码

不能执行，但是下面的这个可以执行

awk 'BEGIN{split("123-456-789",parts_array,"-");print parts_array[1]"\n"parts_array[2]}'

复制代码

这里的BEGIN起到了什么作用？？求赐教啊。。。

作者: yinyuemi 时间: 2012-07-13 23:43
本帖最后由 yinyuemi 于 2012-07-13 23:44 编辑

回复 60# guoweiqust

Gawk executes AWK programs in the following order.  First, all variable assignments specified via the -v
   option  are  performed.  Next, gawk compiles the program into an internal form. Then, gawk executes the
   code in the BEGIN block(s) (if any), and then proceeds to read each file named in the  ARGV  array. If
   there are no files named on the command line, gawk reads the standard input.

作者: guoweiqust 时间: 2012-07-15 15:45
本帖最后由 guoweiqust 于 2012-07-15 15:54 编辑

回复 61# yinyuemi

看完后我的理解是这样的：
如果awk没有输入文件的话，从stdin读入信息；如果存在input file的话，直接就就读入input file的内容进行处理
但是，在含有BEGIN块的情况下，无论是否存在input file，awk都会先执行BEGIN块的action，执行完BEGIN后，
如果后面还有action这时必须需要input（END模块的执行必须要有input ）

以上理解正确吗？谢谢

作者: yinyuemi 时间: 2012-07-15 23:52
回复 62# guoweiqust

是这个意思，有兴趣的话，可以再了解下gawk 4.0的BEGINFILE 和ENDFILE 模块的用法，处理多文件时可以尝试使用，有时会有奇效

http://www.gnu.org/software/gawk/manual/html_node/BEGINFILE_002fENDFILE.html

作者: guoweiqust 时间: 2012-07-16 09:16
本帖最后由 guoweiqust 于 2012-07-16 09:16 编辑

回复 63# yinyuemi

没怎么看懂~~
有例子学起来就容易多了，呵呵~~
man gawk得到的结果中没提到BEGINFILE，ENDFILE

作者: guoweiqust 时间: 2012-07-16 11:10
本帖最后由 guoweiqust 于 2012-07-16 11:11 编辑

[sam@chenwy sam]$ echo "65" | awk '{printf "%c\n",$0}'
A

复制代码

可以得到A，但是反过来怎么不行呢？

x900010-p0024@login1-58% awk 'BEGIN{printf "%d\n",A}'
0

复制代码

作者: 星火2012 时间: 2012-07-16 14:07
高手云集啊，向大家学习。

作者: guoweiqust 时间: 2012-07-16 17:25
回复 1# yinyuemi

请教个问题
awk中的数组是不是和perl中的hash有点类似

作者: yinyuemi 时间: 2012-07-16 23:38
回复 67# guoweiqust

使用上应该是差不多的，参考http://bbs.chinaunix.net/thread-2312439-1-1.html

作者: guoweiqust 时间: 2012-07-17 17:20
本帖最后由 guoweiqust 于 2012-07-17 17:23 编辑

回复 1# yinyuemi

初学shell，在看shell基础十二篇，关于find有个地方没有看懂，求赐教。。

find /usr/sam -path "/usr/sam/dir1" -prune -o -name datafile -print

复制代码

当-path "/usr/sam/dir1" -prune为假时执行-o后面的，我搞不清楚这里的真假的判断是什么

/usr/sam/dir1这个文件夹是存在的
（-path "/usr/sam/dir1" -prune）这条语句本身为什么是假？

作者: guoweiqust 时间: 2012-07-17 17:24
回复 68# yinyuemi
谢谢指点。。受教了

作者: yinyuemi 时间: 2012-07-18 00:00
本帖最后由 yinyuemi 于 2012-07-18 00:01 编辑

回复 69# guoweiqust

-path pattern
            File name matches shell pattern  pattern. The  metacharacters  do  not
            treat `/' or `.' specially; so, for example,
                     find . -path "./sr*sc"
            will print an entry for a directory called `./src/misc' (if one exists).
            To ignore a whole directory tree, use -prune rather than checking  every
            file  in  the  tree.  For example, to skip the directory `src/emacs' and
            all files and directories under it, and print the  names  of  the  other
            files found, do something like this:
                     find . -path ./src/emacs -prune -o -print
            Note  that the pattern match test applies to the whole file name, starting
            from one of the start points named on the command  line. It  would
            only  make sense to use an absolute path name here if the relevant start
            point is also an absolute path.  This means that this command will never
            match anything:
                     find bar -path /foo/bar/myfile -print
            Find  compares  the -path argument with the concatenation of a directory
            name and the base name of the file it's examining.  Since the concatenation
            will never end with a slash, -path arguments ending in a slash will
            match nothing (except perhaps a start point  specified  on  the  command
            line). The predicate -path is also supported by HP-UX find and will be
            in a forthcoming version of the POSIX standard.

作者: guoweiqust 时间: 2012-07-18 09:17
回复 71# yinyuemi

这段话中所说的是不是这样的：
1>其中第一段话说明是man page中的，讲的是-prune的作用
2>第二段说明如果start point用的是绝对路径，则-path不能使用相对路径

x900010-p0024@login1-565% find chinaunix -path chinaunix/awk_basic -prune -o -print
chinaunix
chinaunix/datafile
chinaunix/repeat

复制代码

和

x900010-p0024@login1-566% find chinaunix -path ./chinaunix/awk_basic -prune -o -print
chinaunix
chinaunix/datafile
chinaunix/awk_basic
chinaunix/awk_basic/grade.txt
chinaunix/awk_basic/student_tot.awk
chinaunix/awk_basic/temp
chinaunix/awk_basic/strip
chinaunix/awk_basic/grade_student.txt
chinaunix/awk_basic/grade_student.awk
chinaunix/repeat

复制代码

3>第三段说明检查连续路径时，-path的argument不能以slash结尾

x900010-p0024@login1-567% find chinaunix -path chinaunix/awk_basic/ -prune -o -print
chinaunix
chinaunix/datafile
chinaunix/awk_basic
chinaunix/awk_basic/grade.txt
chinaunix/awk_basic/student_tot.awk
chinaunix/awk_basic/temp
chinaunix/awk_basic/strip
chinaunix/awk_basic/grade_student.txt
chinaunix/awk_basic/grade_student.awk
chinaunix/repeat

复制代码

是这样吗？
4>但是，我依然感觉不到和-o有什么关系，我把-o换成-a又得到不一样的结果

x900010-p0024@login1-571% find chinaunix -path chinaunix/awk_basic -prune -o -print
chinaunix
chinaunix/datafile
chinaunix/repeat

复制代码

换成-a

x900010-p0024@login1-570% find chinaunix -path chinaunix/awk_basic -prune -a -print
chinaunix/awk_basic

复制代码

求赐教。。。

作者: yinyuemi 时间: 2012-07-18 10:46
回复 72# guoweiqust

-a 和 -o 是逻辑判断符号，相当于 “&&” 和 “||”

-prune True; if the file is a directory, do not descend into it. If -depth is given, false; no effect.
Because -delete implies -depth, you cannot usefully use -prune and -delete together.

上面是-prune的使用描述，注意它的值用于为真，除非有-depth。
另外，find 命令执行过程，可以理解为其后每个表达式逻辑判断“汇总”的结果

作者: bbyy2006 时间: 2012-07-31 16:35

太棒了，学习了

作者: linuxforlive 时间: 2012-09-13 15:37
谢谢分享学习中

作者: davidbeckham921 时间: 2012-11-05 16:10
又看了一遍还是有收获，很多东西需要温故知新。

作者: lsc53477426 时间: 2013-01-12 19:50
我也是非计算机专业的，向楼主学习！

作者: wild_li 时间: 2013-03-12 10:10
学习了。。。。

作者: instarttime 时间: 2013-03-13 10:19
收藏。。。。

作者: shouyu924 时间: 2013-04-27 14:47
膜拜了。这种帖子怎么让他沉下去呢。

作者: xmi 时间: 2013-05-16 23:50
很有用呀... 學習學習 ...

作者: 心若寒江雪 时间: 2013-06-27 09:51
又看了一遍，学习了又

作者: sanshuike 时间: 2013-06-27 18:22

呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵呵

作者: bookxiao 时间: 2013-07-11 11:05
学习了，mark一下~~

作者: hanfeng122525 时间: 2013-08-14 21:48
单引号屏蔽所有字符，自然而然也可以屏蔽双引号

双引号屏蔽除了几种特殊字符以外的所有字符那么也可以屏蔽双引号

' "" '

" '' "

作者: bingling512 时间: 2013-09-13 09:03
好文，学习了。

作者: wangchaoqun1984 时间: 2013-10-12 16:08
mark 下 mark 下

作者: zw421961 时间: 2013-12-05 15:00
总结的不错，学习

作者: jeffreyst 时间: 2014-04-28 14:20
好帖，mark下

作者: 网速20M 时间: 2014-10-12 13:01
!a[$0]++
顺序为先 ++，然后再判断！，因为 i++ 返回 i加之前的值，因此会产生这种效果

作者: 李满满 时间: 2014-10-12 13:36
学生党:已收藏~嘿嘿

作者: 刺客阿地 时间: 2014-10-15 13:07
膜拜，学习来的。

作者: 刺客阿地 时间: 2014-10-16 15:14
刚看完此贴，收获不少，感谢

作者: Monday0 时间: 2014-11-20 15:17

作者: Monday0 时间: 2014-11-20 15:18

lol

作者: helloclei 时间: 2015-03-19 14:33
见识了awk的强大，学的越深感觉懂的越少，还需要继续学习哇。

作者: jemna 时间: 2015-05-02 22:32
最近工作不突出，业绩不突出，腰椎间盘有点突出

作者: xiaojie83_cu 时间: 2015-06-09 17:10
很详细，好好学习了。

作者: toddhai 时间: 2016-04-27 10:58
MARK~MARK~MARK

作者: hz_oracle 时间: 2016-05-24 13:46
总结的很到位，我表示看了以后收获蛮多

欢迎光临 Chinaunix (http://bbs.chinaunix.net/)