12 / 2 页下一页

论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2015-09-02 10:31 |只看该作者 |倒序浏览

awk单行不同字段已掌握，当处理一段文本，输出不同行的不同字段时怎么处理呢？

例如文本内容如下：

2015-08-31 15:53:59,383 INFO - 进入任务：sendTask ZH05
2015-08-31 15:53:59,384 INFO - 目标号码：1
2015-08-31 15:53:59,384 INFO - FROM:111，TO:222 BEGIN
2015-08-31 15:53:59,384 INFO - 请求长度为:110
2015-08-31 15:53:59,424 INFO - 请求状态:200
2015-08-31 15:53:59,424 INFO - 返回值为：<?xml version="1.0" encoding="utf-8"?>2|14410076429841410</string>
2015-08-31 15:53:59,427 INFO - 返回值解析后为：2|14410076429841410

复制代码

要求输出结果如下：
2015-08-31 15:53:59,ZH05,222,14410076429841410

以逗号分隔
第一列为本段文本的时间取值
第二列为第一行的最后一个字段
第三列为第三行的TO：后面的值
第四列为返回值中的ID（14410076429841410）

多条件对同一行处理比较清楚，怎么对多行使用不同条件，选择性输出呢，还请给个例子，谢谢。

如何

文库|博客

MMMIX

广告杀手

论坛徽章:: 95

2楼 [报告]

发表于 2015-09-02 10:50 |只看该作者

回复 1# xiaogui_vip

这些行之间有什么关系（根据什么关联在一起）？和其他的类似记录如何区分？

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

jason680

富可敌国

论坛徽章:: 145

3楼 [报告]

发表于 2015-09-02 10:53 |只看该作者

回复 1# xiaogui_vip

$ cat FILE
2015-08-31 15:53:59,383 INFO  -  进入任务：sendTask ZH05
2015-08-31 15:53:59,384 INFO  -  目标号码：1
2015-08-31 15:53:59,384 INFO  -  FROM:111，TO:222 BEGIN
2015-08-31 15:53:59,384 INFO  -  请求长度为:110
2015-08-31 15:53:59,424 INFO  -  请求状态:200
2015-08-31 15:53:59,424 INFO  -  返回值为：<?xml version="1.0" encoding="utf-8"?>2|14410076429841410</string>
2015-08-31 15:53:59,427 INFO  -  返回值解析后为：2|14410076429841410

$ awk '
BEGIN{
OFS=","
}
/进入任务/{
  time = $1" "$2
  sub(/,.+$/, "", time)
  task = $NF
  to = "";
}
match($0,/TO:([^ ]+)/,a){
  to = a[1]
}
/返回值解析后/{
  sub(/^.*[|]/, "", $NF)
  print time, task, to, $NF
}' FILE
2015-08-31 15:53:59,ZH05,222,14410076429841410

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

xiaogui_vip

稍有积蓄

论坛徽章:: 0

4楼 [报告]

发表于 2015-09-02 10:56 |只看该作者

回复 2# MMMIX

行之间的关系是这样的：

其实应该能看出来，这个是一个程序的日志文件，每次程序开始运行，进入sendTask任务，就会输出这样一段的日志，中间不会夹杂其他日志
如果这段文本的内容作为A的话，完整的日志应该是这样的：

A
无任务需要处理
A
无任务需要处理

复制代码

要说关联，并没有根据什么关联在一起，只是一次任务处理的完整日志而已。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

xiaogui_vip

稍有积蓄

论坛徽章:: 0

5楼 [报告]

发表于 2015-09-02 11:08 |只看该作者

回复 3# jason680

前辈能解释一下两个sub分别的意义么？

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

MMMIX

广告杀手

论坛徽章:: 95

6楼 [报告]

发表于 2015-09-02 11:08 |只看该作者

回复 4# xiaogui_vip

隐含的条件你最好是明确说出来，不要让别人去猜。猜错了你不是浪费别人时间么？

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

xiaogui_vip

稍有积蓄

论坛徽章:: 0

7楼 [报告]

发表于 2015-09-02 11:10 |只看该作者

本帖最后由 xiaogui_vip 于 2015-09-02 11:11 编辑

回复 3# jason680

搜嘎，仔细看了一下，明白了，是因为RS是空格，使用sub格式化了一下。

谢谢！仔细咀嚼一下~

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

xiaogui_vip

稍有积蓄

论坛徽章:: 0

8楼 [报告]

发表于 2015-09-02 11:11 |只看该作者

回复 6# MMMIX

什么是隐含的条件？没明白你的意思呢，只是当作单纯的一段文本去处理不好么？

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

jason680

富可敌国

论坛徽章:: 145

9楼 [报告]

发表于 2015-09-02 11:14 |只看该作者

回复 5# xiaogui_vip

https://www.gnu.org/software/gawk/manual/gawk.html#String-Functions
9.1.3 String-Manipulation Functions

The functions in this section look at or change the text of one or more strings.

....

sub(regexp, replacement [, target])

Search target, which is treated as a string, for the leftmost, longest substring matched by the regular expression regexp. Modify the entire string by replacing the matched text with replacement. The modified string becomes the new value of target. Return the number of substitutions made (zero or one).

The regexp argument may be either a regexp constant (/…/) or a string constant ("…"). In the latter case, the string is treated as a regexp to be matched. See Computed Regexps, for a discussion of the difference between the two forms, and the implications for writing your program correctly.

This function is peculiar because target is not simply used to compute a value, and not just any expression will do—it must be a variable, field, or array element so that sub() can store a modified value there. If this argument is omitted, then the default is to use and alter $0.47 For example:

str = "water, water, everywhere"
sub(/at/, "ith", str)

sets str to ‘wither, water, everywhere’, by replacing the leftmost longest occurrence of ‘at’ with ‘ith’.

If the special character ‘&’ appears in replacement, it stands for the precise substring that was matched by regexp. (If the regexp can match more than one string, then this precise substring may vary.) For example:

{ sub(/candidate/, "& and his wife"); print }

changes the first occurrence of ‘candidate’ to ‘candidate and his wife’ on each input line. Here is another example:

$ awk 'BEGIN {
>       str = "daabaaa"
>       sub(/a+/, "C&C", str)
>       print str
> }'
-| dCaaCbaaa

This shows how ‘&’ can represent a nonconstant string and also illustrates the “leftmost, longest” rule in regexp matching (see Leftmost Longest).

The effect of this special character (‘&’) can be turned off by putting a backslash before it in the string. As usual, to insert one backslash in the string, you must write two backslashes. Therefore, write ‘\\&’ in a string constant to include a literal ‘&’ in the replacement. For example, the following shows how to replace the first ‘|’ on each line with an ‘&’:

{ sub(/\|/, "\\&"); print }

As mentioned, the third argument to sub() must be a variable, field, or array element. Some versions of awk allow the third argument to be an expression that is not an lvalue. In such a case, sub() still searches for the pattern and returns zero or one, but the result of the substitution (if any) is thrown away because there is no place to put it. Such versions of awk accept expressions like the following:

sub(/USA/, "United States", "the USA and Canada")

For historical compatibility, gawk accepts such erroneous code. However, using any other nonchangeable object as the third parameter causes a fatal error and your program will not run.

Finally, if the regexp is not a regexp constant, it is converted into a string, and then the value of that string is treated as the regexp to match.

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

xiaogui_vip

稍有积蓄

论坛徽章:: 0

10楼 [报告]

发表于 2015-09-02 11:17 |只看该作者

回复 9# jason680

懂你的意思。谢谢

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

12 / 2 页下一页

返回列表

Chinaunix › 论坛 › 程序设计 › Shell › AWK多行，不同条件输出不同字段如何处理？

[文本处理] AWK多行，不同条件输出不同字段如何处理？ [复制链接]