Chinaunix

标题: 一个比较复杂的awk比对问题 [打印本页]

作者: 殇19921011 时间: 2014-07-31 10:40
标题: 一个比较复杂的awk比对问题
比对文本
{7106561412,50297001,6075261,571,65000326193972,4740,20140601000000,20140701000000:0,360000}
{7106561412,50297001,6075271,571,65000326193972,4740,20140601000000,20140701000000:0,360000}

{7103210808,50297001,6075271,571,65000242312608,960,20140601000000,20140701000000:0,360000}
{7103210808,50297001,6075271,571,65000242312608,960,20140601000000,20140701000000:1140,360000}

{7106561412,50297001,6075271,571,65000326193972,9000,20140601000000,20140701000000:0,360000}
{7106561412,50297001,6075271,571,65000326193972,9000,20140601000000,20140701000000:4740,360000}

{7103210808,50297001,6075271,571,65000242312608,480,20140601000000,20140701000000:1080,360000}
{7103210808,50297001,6075271,571,65000242312608,480,20140601000000,20140701000000:2220,360000}

{7103210808,50297001,6075271,571,65000242312608,120,20140601000000,20140701000000:960,360000}
{7103210808,50297001,6075271,571,65000242312608,120,20140601000000,20140701000000:2100,360000}

以空格为分界线比对相邻两行数据，每行数据以“，”为分隔符，屏蔽掉第八个域，即除了第八个域。如果两行数据不同，打印出这相邻的两行数据。文本中的数据有多条记录

求大神解答。在线等

作者: yestreenstars 时间: 2014-07-31 10:48
是怎样比对？
1和2比对，然后2和3比对……
1和2比对，然后3和4比对……

作者: ly5066113 时间: 2014-07-31 11:01
回复 1# 殇19921011

try:

awk -v RS= 'gensub(/,[^,]*/,"",7,$1)!=gensub(/,[^,]*/,"",7,$2)' file

复制代码

作者: q1208c 时间: 2014-07-31 11:15
好奇怪的需求, 我宁愿写个perl程序来处理, 免得以后自己也不知道要怎么处理了.

作者: Kasiotao 时间: 2014-07-31 11:21
本帖最后由 Kasiotao 于 2014-07-31 11:25 编辑

回复 1# 殇19921011

awk -vi=0 -F\, '$0==NULL{i=0;if(a[0]!=a[1]) print b[0]"\n"b[1]}$0!=NULL{a[i]=substr($1 $2 $3 $4 $5 $6 $7,2);b[i]=$0;i++}' testfile

复制代码

我写好长。。。肯定不是最好的

作者: 殇19921011 时间: 2014-07-31 11:38
1和2对比，然后3和4对比，以空行为分隔回复 2# yestreenstars

作者: lifayi2008 时间: 2014-07-31 11:46

awk 'BEGIN{FS=","}{a[NR]=$0;b[NR]=$1$2$3$4$5$6$7$9;if(NR%3==2 && b[NR]!=b[NR-1])print a[NR-1]"\n"a[NR]}' c.txt

复制代码

作者: yestreenstars 时间: 2014-07-31 11:47
回复 4# q1208c

你是来宣传perl的~

作者: Kasiotao 时间: 2014-07-31 11:48
回复 6# 殇19921011
ly的方法很精辟。。你可以学习下。。以空行做RS，然后用gensub比较，这样也不影响原字符串，碉堡

作者: 殇19921011 时间: 2014-07-31 11:51
我的机器不能用gensub啊 awk: Function gensub is not defined.回复 9# Kasiotao

作者: jason680 时间: 2014-07-31 11:57
回复 1# 殇19921011

$ awk -F, 'NF==0{n=0;next}{if(++n==1){a=$0;$8="";b=$0}else{c=$0;$8="";if(b!=$0)print a"\n"c"\n"}}' FILE
{7106561412,50297001,6075261,571,65000326193972,4740,20140601000000,20140701000000:0,360000}
{7106561412,50297001,6075271,571,65000326193972,4740,20140601000000,20140701000000:0,360000}

作者: q1208c 时间: 2014-07-31 13:54
回复 8# yestreenstars

不是. 其实一直在学习erlang, 但它根本不适合这项工作.

要尽可能选择合适的工具, 这个合适, 不只是可以解决问题, 还包括解决问题的时间成本, 熟练程度也会影响时间成本.
后期的可维护性, 也是一个重要的考虑因素, 当然, 如果只处理一次, 那随便搞搞了.

作者: itfly3 时间: 2014-07-31 14:35
awk -v RS= '{a=$0}gsub(/,[^,]*:[^,]*/,"",$0){if($1!=$2)print a}' t

作者: lklkxcxc 时间: 2014-07-31 14:59
回复 3# ly5066113
7是代表前7域吗？

作者: Kasiotao 时间: 2014-07-31 18:38
本帖最后由 Kasiotao 于 2014-07-31 18:41 编辑

回复 14# lklkxcxc

gensub(r, s, h [, t]) Search the target string t for matches of the
regular expression r. If h is a string begin‐
ning with g or G, then replace all matches of r
with s. Otherwise, h is a number indicating
which match of r to replace.

复制代码

不是前7个域，是满足正则匹配式的第7个字符串，从,到下一个,前第7个字符串刚好是LZ想忽略的以，为FS的$8，把它去掉，再比较相邻两行的字符串

欢迎光临 Chinaunix (http://bbs.chinaunix.net/)