Chinaunix

标题: gawk 4.0.0 release!!! [打印本页]

作者: yinyuemi    时间: 2011-07-01 04:29
标题: gawk 4.0.0 release!!!
本帖最后由 yinyuemi 于 2011-07-07 01:53 编辑

大家可以到GNU的ftp上下载下来爽一爽ftp://ftp.gnu.org/gnu/gawk,粗略的看了下介绍,新版本的gawk功能更强大了!!!
下面是4.0.0版本gawk的一些新的features(测试了下部分功能):
http://lists.gnu.org/archive/html/info-gnu/2011-06/msg00013.html

   Copyright (C) 2010, 2011 Free Software Foundation, Inc.

   Copying and distribution of this file, with or without modification,
   are permitted in any medium without royalty provided the copyright
   notice and this notice are preserved.

Changes from 3.1.8 to 4.0.0
---------------------------

1. The special files /dev/pid, /dev/ppid, /dev/pgrpid and /dev/user are
   now completely gone. Use PROCINFO instead.

2. The POSIX 2008 behavior for `sub' and `gsub' are now the default.
   THIS CHANGES BEHAVIOR!!!!

  1. echo '11122211' |awk '{sub(/1{3}/,"")}1'
  2. 22211
复制代码
3. The \s and \S escape sequences are now recognized in regular expressions.

  1. echo '111 222  11' |awk '{gsub(/\s/,"")}1'
  2. 11122211
复制代码
4. The split() function accepts an optional fourth argument which is an array
   to hold the values of the separators.
  1. echo '111-222|33' |awk '{split($0,a,/[-|]/,seps);print "a[1] = "a[1] RS "a[2] = "a[2] RS "a[3] = "a[3] RS "spes[1] = "seps[1] RS "speS[2] = "seps[2]}'
  2. a[1] = 111
  3. a[2] = 222
  4. a[3] = 33
  5. spes[1] = -
  6. speS[2] = |
复制代码
5. New -b / --characters-as-bytes option that means "hands off my data"; gawk
   won't try to treat input as a multibyte string.

6. New --sandbox option; see the doc.
  1. --sandbox
  2.     Disable the system() function, input redirections with getline, output redirections with print and printf, and dynamic extensions. This is particularly useful when you want to run awk scripts from questionable sources and need to make sure the scripts can't access your system (other than the specified input data file).
复制代码
7. Indirect function calls are now available.
  1. --With indirect function calls, you tell gawk to use the value of a variable as the name of the function to call.
复制代码
8. Interval expressions are now part of default regular expressions for
   GNU Awk syntax.

9. --gen-po is now correctly named --gen-pot.

10. switch / case is now enabled by default. There's no longer a need
    for a configure-time option.
  1. --Control flow in the switch statement works as it does in C.

  2. seq 10 |awk '{switch ($0%2){
  3. case "0":
  4. print "even number: "$0;break
  5. default:
  6. print "odd number: "$0
  7. }
  8. }'
  9. odd number: 1
  10. even number: 2
  11. odd number: 3
  12. even number: 4
  13. odd number: 5
  14. even number: 6
  15. odd number: 7
  16. even number: 8
  17. odd number: 9
  18. even number: 10
复制代码
11. Gawk now supports BEGINFILE and ENDFILE. See the doc for details.

--The body of the BEGINFILE rules is executed just before gawk reads the first record from a file. FILENAME is set to the name of the current file, and FNR is set to zero.
--The ENDFILE rule is called when gawk has finished processing the last record in an input file. For the last input file, it will be called before any END rules. (这两个功能真的很酷,尤其是在处理多个文件时,如下面:)

  1. head f1 f2
  2. ==> f1 <==
  3. aaa
  4. bbb
  5. ccc

  6. ==> f2 <==
  7. aaa
  8. bbb
  9. ccc

  10. awk 'BEGIN{print"BEGIN: ---"}BEGINFILE{print "\nBEGINFILE: +++"}{print}ENDFILE{print"ENDFILE: +++\n"}END{print"END: ---"}' f1 f2
  11. BEGIN: ---

  12. BEGINFILE: +++
  13. aaa
  14. bbb
  15. ccc
  16. ENDFILE: +++


  17. BEGINFILE: +++
  18. aaa
  19. bbb
  20. ccc
  21. ENDFILE: +++

  22. END: ---
复制代码
12. Directories named on the command line now produce a warning, not
    a fatal error, unless --posix or --traditional.

13. The new FPAT variable allows you to specify a regexp that matches
    the fields, instead of matching the field separator. The new patsplit()
    function gives the same capability for splitting.

--The value of FPAT should be a string that provides a regular expression. This regular expression describes the contents of each field.

  1. echo '111-222|33' |awk -vFS="[-|]" '{print "$1 = "$1 RS "$2 = "$2 RS "$3 = "$3}'
  2. $1 = 111
  3. $2 = 222
  4. $3 = 33

  5. #如果用FPAT呢?

  6. echo '111-222|33' |awk -vFPAT="[^-|]+" '{print "$1 = "$1 RS "$2 = "$2 RS "$3 = "$3}'
  7. $1 = 111
  8. $2 = 222
  9. $3 = 33
复制代码
14. All long options now have short options, for use in `#!' scripts.

15. Support for IPv6 added via /inet6/... special file. /inet4/... forces
    IPv4 and /inet chooses the system default (probably IPv4).

16. Added a warning for /[:space:]/ that should be /[[:space:]]/.

17. Merged with John Haque's byte code internals. Adds dgawk debugger and
    possibly improved performance.

18. `break' and `continue' are no longer valid outside a loop, even with
    --traditional.

19. POSIX character classes work with --traditional (BWK awk supports them).

20. Nuked redundant --compat, --copyleft, and --usage long options.

21. Arrays of arrays added. See the doc. (这个更强!)

  1. awk 'BEGIN{arr["a"]["b"]=1;arr["a"]["c"]=2;
  2. for( i in arr)
  3. for( j in arr[i])
  4. print i,j,arr[i][j]
  5. }'
  6. a b 1
  7. a c 2
复制代码
22. Per the GNU Coding Standards, dynamic extensions must now define
    a global symbol indicating that they are GPL-compatible. See
    the documentation and example extensions.
    THIS CHANGES BEHAVIOR!!!!

23. In POSIX mode, string comparisons use strcoll/wcscoll.
    THIS CHANGES BEHAVIOR!!!!

24. The option for raw sockets was removed, since it was never implemented.

25. If not in POSIX mode, gawk turns ranges of the form [d-h] into
    [defgh] before compiling a regexp.  Maybe this will stop all the
    questions about [a-z] matching uppercase letters.
    THIS CHANGES BEHAVIOR!!!!

26. PROCINFO["strftime"] now holds the default format for strftime().

27. Updated to latest infrastructure: Autoconf 2.68, Automake 1.11.1,
    Gettext 0.18.1, Bison 2.5.

28. Many code cleanups. Removed code for many old, unsupported systems:
        - Atari
        - Amiga
        - BeOS
        - Cray
        - MIPS RiscOS
        - MS-DOS with Microsoft Compiler
        - MS-Windows with Microsoft Compiler
        - NeXT
        - SunOS 3.x, Sun 386 (Road Runner)
        - Tandem (non-POSIX)
        - Prestandard VAX C compiler for VAX/VMS
        - Probably others that I've forgotten

29. If PROCINFO["sorted_in"] exists, for(iggy in foo) loops sort the
    indices before looping over them.  The value of this element
    provides control over how the indices are sorted before the loop
    traversal starts. See the manual.

30. A new isarray() function exists to distinguish if an item is an array
    or not, to make it possible to traverse multidimensional arrays.

31. asort() and asorti() take a third argument specifying how to sort.
    See the doc.
--

作者: xiaopan3322    时间: 2011-07-01 11:12
沙发……
作者: zooyo    时间: 2011-07-01 17:26
提示: 作者被禁止或删除 内容自动屏蔽
作者: sk1418    时间: 2011-07-01 20:24
很多非常好新功能!
但是,有个问题是,这个4。0什么时候能成为标配阿。在自己机器上过瘾地用完了新功能,放server上都不转了可就麻烦了。
作者: lionfun    时间: 2011-07-01 22:47
先顶!
作者: ziyunfei    时间: 2011-07-03 18:31
Cygwin 编译中...
作者: huazai202    时间: 2011-07-04 10:38
提示: 作者被禁止或删除 内容自动屏蔽
作者: xiaopan3322    时间: 2011-07-06 14:35
awk 4.0 改进内容:

1. 增加了新的参数
2. 所有长参数都有对应的短参数
3. "--sandbox" 参数不再调用 system() 来访问文件系统
4. 默认使用 POSIX 2008 "sub" 和 "gsub" 动作
5. 增强了对正则表达式的支持.
6. 其他方面的改进、bug修复和代码清理
作者: Shell_HAT    时间: 2011-07-07 01:03
回复 1# yinyuemi


有没有可以在windows上直接使用的exe?
作者: yinyuemi    时间: 2011-07-07 01:38
回复 9# Shell_HAT


    gawk4.00支持cygwin environment,需要编译,(不过我没成功,ls紫云飞兄不知成功没,老大可以试试)
作者: Shell_HAT    时间: 2011-07-07 01:42
回复 10# yinyuemi


其实我想找个exe(最好是官方一点的^_^)给那些不会用cygwin的windows用户,呵呵。
作者: yinyuemi    时间: 2011-07-07 01:58
回复 11# Shell_HAT


    好像没有,因为现在的这个也是beta version,可能过段时间会有updated release
作者: rdcwayx    时间: 2011-07-07 08:49
回复  yinyuemi


其实我想找个exe(最好是官方一点的^_^)给那些不会用cygwin的windows用户,呵呵。
Shell_HAT 发表于 2011-07-07 01:42



你可以参考这个文档自己编译

http://gnuwin32.sourceforge.net/compile.html
作者: Shell_HAT    时间: 2011-07-07 08:52
回复 13# rdcwayx


看来我还是没有说清楚^_^
自己编译的东西拿给别人用,也许有人会怀疑你在里面动了手脚啥的,呵呵。
作者: rdcwayx    时间: 2011-07-07 09:09
回复  rdcwayx


看来我还是没有说清楚^_^
自己编译的东西拿给别人用,也许有人会怀疑你在里面动了手脚 ...
Shell_HAT 发表于 2011-07-07 08:52



你可以自己在sourceforge 或者其他类似网站,自建个项目(project), 有了这个项目后,还可以邀请其他同好一起做。

这些都是开源的,公开的,通常不会有问题。
作者: xiaopan3322    时间: 2011-07-10 10:17
你可以自己在sourceforge 或者其他类似网站,自建个项目(project), 有了这个项目后,还可以邀请其 ...
rdcwayx 发表于 2011-07-07 09:09



    以前在学校的时候搞过一次,完全没人鸟,可能当时也不知道啥流程,
作者: ziyunfei    时间: 2011-07-11 21:31
回复  Shell_HAT


    gawk4.00支持cygwin environment,需要编译,(不过我没成功,ls紫云飞兄不知成 ...
yinyuemi 发表于 2011-07-07 01:38



    编译成功了,上传上来大家用.

gawk.rar

368.49 KB, 下载次数: 65


作者: yinyuemi    时间: 2011-07-11 23:54
回复 17# ziyunfei


   3q
作者: rdcwayx    时间: 2011-07-12 12:03
编译成功了,上传上来大家用.
ziyunfei 发表于 2011-07-11 21:31



    不错嘛。 已经在用了。

是参考我上面给的文档做的编译吗?
作者: ziyunfei    时间: 2011-07-12 12:19
不错嘛。 已经在用了。

是参考我上面给的文档做的编译吗?
rdcwayx 发表于 2011-07-12 12:03


不是啊,很简单的.就在cygwin里编译的.
        tar -xpvzf gawk-4.0.0.tar.gz
        cd gawk-4.0.0
        ./configure && make && make check
几句.
作者: rdcwayx    时间: 2011-07-12 14:05
本帖最后由 rdcwayx 于 2011-07-12 14:07 编辑

这样编译出来的可能有问题:

Then install Mingw; you'd best use the latest regular release ("Current"). Mingw can be downloaded from its Sourceforge site. You'll need GCC, Binutils and Windows API. Do not install these into the Cygwin directory. Make sure the directory with the GCC and Binutils executables comes before the Cygwin ones in your Path. You cannot use the Cygwin GCC and Binutils, because the executables they create are not native Windows ones, but depend on the Cygwin emulation layer (cygwin1.dll).

我电脑里已经安装了cygwin, 所以跑你这个没有问题。

有没有筒子,电脑上不安装cygwin,是否也可以正常运行。 能来确认一下吗?
作者: Shell_HAT    时间: 2011-07-12 14:06
回复 20# ziyunfei


gawk --version
---------------------------
gawk.exe - Unable To Locate Component
---------------------------
This application has failed to start because cygwin1.dll was not found. Re-installing the application may fix this problem.
---------------------------
OK   
---------------------------
作者: rdcwayx    时间: 2011-07-12 14:08
谢谢确认,应该就是这个问题了。

所以还得回到我给的那个链接,照着做一遍,得到的gawk.exe才是最通用的。
作者: zhaoke0128    时间: 2011-07-25 13:43
来晚了,晚上回家试试
作者: xiaopan3322    时间: 2011-07-30 09:25
灌水
作者: cherishwz    时间: 2011-08-01 17:14
好东西啊!
作者: java_html    时间: 2011-08-01 18:58
赞一个!!
作者: cjaizss    时间: 2013-04-18 14:36
拜读,我最关心的是数组的数组,这个功能是在太强大了,于是很多程序可以在该版本下改写,比如
http://bbs.chinaunix.net/thread-4077351-1-1.html
其次关心的是那个BEGINFILE和ENDFILE,有点意思。
不过新的功能的使用我总是比较小心,总怕那属于旁门左道,呵呵。
作者: liangadty    时间: 2013-04-18 23:10
貌似功能强大了~~~原来的是不是内置
作者: ziyunfei    时间: 2013-04-19 00:34
挖坟啊.看我翻译的一篇小文章 http://blog.chinaunix.net/uid-14293861-id-2977155.html
作者: baby_神    时间: 2016-06-08 12:47
现在centos 7 才把awk升级到4.0.2。。。。




欢迎光临 Chinaunix (http://bbs.chinaunix.net/) Powered by Discuz! X3.2