免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2364 | 回复: 0
打印 上一主题 下一主题

awk 教程 1 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2004-12-16 11:50 |只看该作者 |倒序浏览
不经常用,awk的语法老是忘。
今天发现了一个awk的小教程,简单,而且有很多实用的小例子。
http://www.vectorsite.net/indextt.html
[1.0] A Guided Tour Of Awk
    * [1.1] AWK OVERVIEW
    * [1.2] AWK COMMAND-LINE EXAMPLES
    * [1.3] AWK PROGRAM EXAMPLE
[1.0] A Guided Tour Of Awk
v1.0.9 / chapter 1 of 3 / 01 oct 04 / greg goebel / public domain
* This chapter provides an overview of Awk and a quick tour of its use.
[1.1] AWK OVERVIEW
* The Awk text-processing language is useful for such tasks as:

  • Tallying information from text files and creating reports from the results.
  • Adding additional functions to text editors like "vi".
  • Translating files from one format to another.
  • Creating small databases.
  • Performing mathematical operations on files of numeric data.

Awk has two faces:  it is a utility for performing simple text-processing
tasks, and it is a programming language for performing complex
text-processing tasks.
The two faces are really the same, however.  Awk uses the same mechanisms
for handling any text-processing task, but these mechanisms are flexible
enough to allow useful Awk programs to be entered on the command line, or to
implement complicated programs containing dozens of lines of Awk statements.
Awk statements comprise a programming language.  In fact, Awk is useful for
simple, quick-and-dirty computational programming.  Anybody who can write a
BASIC program can use Awk, although Awk's syntax is different from that of
BASIC.  Anybody who can write a C program can use Awk with little
difficulty, and those who would like to learn C may find Awk a useful
stepping stone, with the caution that Awk and C have significant differences
beyond their many similarities.
There are, however, things that Awk is not.  It is not really well suited
for extremely large, complicated tasks.  It is also an "interpreted"
language -- that is, an Awk program cannot run on its own, it must be
executed by the Awk utility itself.  That means that it is relatively slow,
though it is efficient as interpretive languages go, and that the program
can only be used on systems that have Awk.  There are translators available
that can convert Awk programs into C code for compilation as stand-alone
programs, but such translators have to be purchased separately.
One last item before proceeding:  What does the name "Awk" mean?  Awk
actually stands for the names of its authors:  "Aho, Weinberger, &
Kernighan".  Kernighan later noted:  "Naming a language after its authors
... shows a certain poverty of imagination."  The name is reminiscent of
that of an oceanic bird known as an "auk", and so the picture of an auk
often shows up on the cover of books on Awk.
[1.2] AWK COMMAND-LINE EXAMPLES
* It is easy to use Awk from the command line to perform simple operations
on text files.  Suppose I have a file named "coins.txt" that describes a
coin collection.  Each line in the file contains the following information:
  metal  weight in ounces   date minted   country of origin   description
The file has the contents:
   gold     1    1986  USA                 American Eagle
   gold     1    1908  Austria-Hungary     Franz Josef 100 Korona
   silver  10    1981  USA                 ingot
   gold     1    1984  Switzerland         ingot
   gold     1    1979  RSA                 Krugerrand
   gold     0.5  1981  RSA                 Krugerrand
   gold     0.1  1986  PRC                 Panda
   silver   1    1986  USA                 Liberty dollar
   gold     0.25 1986  USA                 Liberty 5-dollar piece
   silver   0.5  1986  USA                 Liberty 50-cent piece
   silver   1    1987  USA                 Constitution dollar
   gold     0.25 1987  USA                 Constitution 5-dollar piece
   gold     1    1988  Canada              Maple Leaf
I could then invoke Awk to list all the gold pieces as follows:
   awk '/gold/' coins.txt
This tells Awk to search through the file for lines of text that contain the
string "gold", and print them out.  The result is:
   gold     1    1986  USA                 American Eagle
   gold     1    1908  Austria-Hungary     Franz Josef 100 Korona
   gold     1    1984  Switzerland         ingot
   gold     1    1979  RSA                 Krugerrand
   gold     0.5  1981  RSA                 Krugerrand
   gold     0.1  1986  PRC                 Panda
   gold     0.25 1986  USA                 Liberty 5-dollar piece
   gold     0.25 1987  USA                 Constitution 5-dollar piece
   gold     1    1988  Canada              Maple Leaf
* This is all very nice, you say, but any "grep" or "find" utility can do the
same thing.  True, but Awk is capable of doing much more.  For example,
suppose I only want to print the description field, and leave all the other
text out.  I could then change my invocation of Awk to:
   awk '/gold/ {print $5,$6,$7,$8}' coins.txt
This yields:
   American Eagle  
   Franz Josef 100 Korona
   ingot   
   Krugerrand   
   Krugerrand   
   Panda   
   Liberty 5-dollar piece
   Constitution 5-dollar piece
   Maple Leaf
This example demonstrates the simplest general form of an Awk program:
   awk  {}
Awk searches through the input file for each line that contains the search
pattern.  For each of these lines found, Awk then performs the specified
actions.  In this example, the action is specified as:
   {print $5,$6,$7,$8}
The purpose of the "print" statement is obvious.  The "$5", "$6", "$7", and
"$8" are "fields", or "field variables", which store the words in each line
of text by their numeric sequence.  "$1", for example, stores the first word
in the line, "$2" has the second, and so on.  By default, a "word" is
defined as any string of printing characters separated by spaces.
Since "coins.txt" has the the structure:
  metal  weight in ounces   date minted   country of origin   description
-- then the field variables are matched to each line of text in the file as
follows:
   metal:        $1
   weight:       $2
   date:         $3
   country:      $4
   description:  $5 through $8
The program action in this example prints the fields that contain the
description.  The description field in the file may actually include from one
to four fields, but that's not a problem, since "print" simply ignores any
undefined fields.  The astute reader will notice that the "coins.txt" file is
neatly organized so that the only piece of information that contains multiple
fields is at the end of the line.  This is a little contrived, but that's the
way examples are.
* Awk's default program action is to print the entire line, which is what
"print" does when invoked without parameters.  This means that the first
example:
   awk '/gold/'
-- is the same as:
   awk '/gold/ {print}'
Note that Awk recognizes the field variable $0 as representing the entire
line, so this could also be written as:
   awk '/gold/ {print $0}'
This is redundant, but it does have the virtue of making the action more
obvious.
* Now suppose I want to list all the coins that were minted before 1980.  I
invoke Awk as follows:
   awk '{if ($3
This yields:
   1908      Franz Josef 100 Korona
   1979      Krugerrand
This new example adds a few new concepts:

  • No search pattern is specified.  Without a search pattern, Awk will match
       all lines in the input file, and perform the actions on each one.
  • I can add text of my own to the "print" statement (in this case, four
       spaces) simply by enclosing the text in quotes and adding it to the
       parameter list.
  • An "if" statement is used to check for a date field earlier than 1980,
       and the "print" statement is executed only if that condition is true.
       There's a subtle issue involved here, however.  In most computer
       languages, strings are strings, and numbers are numbers.  There are
       operations that unique to each, and one must be specifically converted to
       the other with conversion functions.  You don't concatenate numbers, and
       you don't perform arithmetic operations on strings.
       Awk, on the other hand, makes no strong distinction between strings and
       numbers.  In computer-science terms, it isn't a "strongly-typed"
       language.  All the fields are regarded as strings, but if that string
       also happens to represent a number, numeric operations can be performed
       on it.  So we can perform an arithmetic comparison on the date field.

* The next example prints out how many coins are in the collection:
   awk 'END {print NR,"coins"}' coins.txt
This yields:
   13 coins
The first new item in this example is the END statement.  To explain this, I
have to extend the general form of an Awk program to:
   awk 'BEGIN              {}
         {}
         {}
        ...
        END                {}'
The BEGIN clause performs any initializations required before Awk starts
scanning the input file.  The subsequent body of the Awk program consists of
a series of search patterns, each with its own program action.  Awk scans
each line of the input file for each search pattern, and performs the
appropriate actions for each string found.  Once the file has been scanned,
an END clause can be used to perform any final actions required.
So, this example doesn't perform any processing on the input lines
themselves.  All it does is scan through the file and perform a final
action:  print the number of lines in the file, which is given by the "NR"
variable.
NR stands for "number of records".  NR is one of Awk's "pre-defined"
variables.  There are others, for example the variable NF gives the number
of fields in a line, but a detailed explanation will have to wait for later.
* Suppose the current price of gold is $425, and I want to figure out the
approximate total value of the gold pieces in the coin collection.  I invoke
Awk as follows:
   awk '/gold/ {ounces += $2} END {print "value = $" 425*ounces}' coins.txt
This yields:
   value = $2592.5
In this example, "ounces" is a variable I defined myself, or a "user defined"
variable.  You can use almost any string of characters as a variable name in
Awk, as long as the name doesn't conflict with some string that has a
specific meaning to Awk, such as "print" or "NR" or "END".  There is no need
to declare the variable, or to initialize it.  A variable handled as a string
variable is initialized to the "null string", meaning that if you try to
print it, nothing will be there.  A variable handled as a numeric variable
will be initialized to zero.
So the program action:
   {ounces += $2}
-- sums the weight of the piece on each matched line into the variable
"ounces".  Those who program in C should be familiar with the "+=" operator.
Those who don't can be assured that this is just a shorthand way of saying:
   {ounces = ounces + $2}
The final action is to compute and print the value of the gold:
   END {print "value = $" 425*ounces}
The only thing here of interest is that the two print parameters, the literal
'"value = $"' and the expression "425*ounces", are separated by a space, not
a comma.  This concatenates the two parameters together on output, without
any intervening spaces.
[1.3] AWK PROGRAM EXAMPLE
* All this is fun, but each of these examples only seems to nibble away at
"coins.txt".  Why not have Awk figure out everything interesting at one
time?
The immediate objection to this idea is that it would be impractical to enter
a lot of Awk statements on the command line, but that's easy to fix.  The
commands can be written into a file, and then Awk can be told to execute the
commands from that file as follows:
   awk -f
Given an ability to write an Awk program in this way, then what should a
"master" "coins.txt" analysis program do?  Here's one possible output:
  Summary Data for Coin Collection:
  
     Gold pieces:                   nn
     Weight of gold pieces:         nn.nn
     Value of gold pieces:       n,nnn.nn
     Silver pieces:                 nn
     Weight of silver pieces:       nn.nn
     Value of silver pieces:     n,nnn.nn
     Total number of pieces:        nn
     Value of collection:        n,nnn.nn
The following Awk program generates this information:
   # This is an awk program that summarizes a coin collection.
   #
   /gold/    { num_gold++; wt_gold += $2 }      # Get weight of gold.
   /silver/  { num_silver++; wt_silver += $2 }  # Get weight of silver.
   END { val_gold = 485 * wt_gold;              # Compute value of gold.
         val_silver = 16 * wt_silver;           # Compute value of silver.
         total = val_gold + val_silver;
         print "Summary data for coin collection:";  # Print results.
         printf ("n");
         printf ("   Gold pieces:                   %2dn", num_gold);
         printf ("   Weight of gold pieces:         %5.2fn", wt_gold);
         printf ("   Value of gold pieces:        %7.2fn",val_gold);
         printf ("n");
         printf ("   Silver pieces:                 %2dn", num_silver);
         printf ("   Weight of silver pieces:       %5.2fn", wt_silver);
         printf ("   Value of silver pieces:      %7.2fn",val_silver);
         printf ("n");
         printf ("   Total number of pieces:        %2dn", NR);
         printf ("   Value of collection:         %7.2fn", total); }
This program has a few interesting features:  

  • Comments can be inserted in the program by preceding them with a "#".
  • Note the statements "num_gold++" and "num_silver++".  C programmers
       should understand the "++" operator.  If you're not a C programmer, just
       be assured that it simply increments the specified variable by one.
  • Multiple statements can be written on the same line by separating them
       with a semicolon (";").
  • Note the use of the "printf" statement, which offers more flexible
       printing capabilities than the "print" statement.  "Printf" has the
       general syntax:  
       printf("",)
       There is one format code for each of the parameters in the list.  Each
       format code determines how its corresponding parameter will be printed.
       For example, the format code "%2d" tells Awk to print a two-digit integer
       number, and the format code "%7.2f" tells Awk to print a seven-digit
       floating-point number, with two digits to the right of the decimal point.
       Note also that, in this example, each string printed by "printf" ends with
       a "n", which is a code for a "newline" (ASCII line-feed code).  Unlike
       the "print" statement, which automatically advances the output to the next
       line when it prints a line, "printf" does not automatically advance the
       output, and by default the next output statement will append its output to
       the same line.  A newline forces the output to skip to the next line.

* I stored this program in a file named "summary.awk", and invoked it as
follows:
   awk -f summary.awk coins.txt
The output was:
   Summary data for coin collection:
      Gold pieces:                    9
      Weight of gold pieces:          6.10
      Value of gold pieces:        2958.50
      Silver pieces:                  4
      Weight of silver pieces:       12.50
      Value of silver pieces:       200.00
      Total number of pieces:        13
      Value of collection:         3158.50
* This information should give you enough background to make good use of
Awk.  The next chapter provides a much more complete description of the
language.

本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/2108/showart_8139.html
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP