免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 3882 | 回复: 2
打印 上一主题 下一主题

[文本处理] 两个文件之间的过滤 [复制链接]

论坛徽章:
1
技术图书徽章
日期:2013-09-25 21:06:29
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2014-01-11 00:01 |只看该作者 |倒序浏览
我这边有两个文件18656536338_fdr_2145.txt, 18656536338_logcdr_092145.txt
想从18656536338_fdr_2145.txt 文件中过滤掉 18656536338_logcdr_092145.txt 中的第八个字段不存在于第一个文件的第14个字段的记录?

文件如下:
[root@Zhengwu_merge 18656536338_2014010922]# head 18656536338_logcdr_092145.txt
18656536338|2014-01-09 21:46:11|2014-01-09 21:46:16|2014-01-09 21:46:11|10.14.16.231|58.243.250.141|43741|123.235.33.22|80|http://wireless.mapbar.com/rever ... ing.json?q=bblefelm,afbgefmb&gb=02&ch=GBK&rn=5&nq=2&format=true
18656536338|2014-01-09 21:45:56|2014-01-09 21:46:06|2014-01-09 21:45:56|10.14.16.231|58.243.250.141|62314|202.108.23.233|8000|http://im.tieba.baidu.com:8000/
18656536338|2014-01-09 21:45:52|2014-01-09 21:45:54|2014-01-09 21:45:52|10.14.16.231|58.243.250.141|63542|111.206.37.24|80|http://sdk.imap.baidu.com/sdk/v? ... r=1&screen=(480,800)&dpi=(240,240)&ctm=1389275135837&name=%E6%89%8B%E6%9C%BA%E8%90%A5%E4%B8%9A%E5%8E%85
18656536338|2014-01-09 21:45:55|2014-01-09 21:46:15|2014-01-09 21:45:55|10.14.16.231|58.243.250.141|11228|123.125.38.250|80|http://api.m.renren.com/api/batch/run
18656536338|2014-01-09 21:46:02|2014-01-09 21:46:05|2014-01-09 21:46:02|10.14.16.231|58.243.250.141|29779|202.108.23.85|80|http://hmma.baidu.com/app.gif
18656536338|2014-01-09 21:46:15|2014-01-09 21:46:22|2014-01-09 21:46:15|10.14.16.231|58.243.250.141|43742|123.235.33.22|80|http://wireless.mapbar.com/rever ... ing.json?q=bblefelm,afbgefmb&gb=02&ch=GBK&rn=5&nq=2&format=true
18656536338|2014-01-09 21:46:13|2014-01-09 21:46:15|2014-01-09 21:46:13|10.14.16.231|58.243.250.141|48425|123.125.65.115|80|http://loc.map.baidu.com/sdk.php
18656536338|2014-01-09 21:46:31|2014-01-09 21:46:34|2014-01-09 21:46:31|10.14.16.231|58.243.250.141|10311|60.217.241.30|80|http://wireless.mapbar.com/posit ... awde&idx=10
18656536338|2014-01-09 21:47:35|2014-01-09 21:47:38|2014-01-09 21:47:35|10.14.16.231|58.243.250.141|48440|123.125.65.115|80|http://loc.map.baidu.com/sdk.php
18656536338|2014-01-09 21:48:02|2014-01-09 21:48:04|2014-01-09 21:48:02|10.14.16.231|58.243.250.141|28412|60.217.232.164|80|http://wireless.mapbar.com/posit ... JggD&idx=12

[root@Zhengwu_merge 18656536338_2014010922]# head 18656536338_fdr_2145.txt
18656536338|21764|21672|351858054419270|912|2014-01-09 21:45:51.5385150|2014-01-09 21:45:51.5398460|1|68|132|200|2|10.14.16.231|58.242.2.2|0||3gnet|460015455636266|220.206.178.49|220.206.178.77||4758|53|0||sdk.imap.baidu.com (A)(Host address)
18656536338|21764|21672|351858054419270|912|2014-01-09 21:45:53.6185280|2014-01-09 21:45:53.6199780|1|67|387|454|2|10.14.16.231|58.242.2.2|0||3gnet|460015455636266|220.206.178.49|220.206.178.77||21874|53|0||talk.m.renren.com (A)(Host address)
18656536338|21764|21672|351858054419270|912|2014-01-09 21:45:54.5791480|2014-01-09 21:45:54.5806720|1|68|112|180|2|10.14.16.231|58.242.2.2|0||3gnet|460015455636266|220.206.178.49|220.206.178.77||49143|53|0||im.tieba.baidu.com (A)(Host address)
18656536338|21764|21672|351858054419270|912|2014-01-09 21:45:54.9591590|2014-01-09 21:45:54.9605980|1|66|156|222|2|10.14.16.231|58.242.2.2|0||3gnet|460015455636266|220.206.178.49|220.206.178.77||50770|53|0||api.m.renren.com (A)(Host address)
18656536338|21764|21672|351858054419270|912|2014-01-09 21:45:55.4400440|2014-01-09 21:45:55.4413880|1|64|99|163|2|10.14.16.231|58.242.2.2|0||3gnet|460015455636266|220.206.178.49|220.206.178.77||27035|53|0||api.k.sohu.com (A)(Host address)
18656536338|21764|21672|351858054419270|912|2014-01-09 21:45:55.9389530|2014-01-09 21:45:55.9402830|1|76|148|224|2|10.14.16.231|58.242.2.2|0||3gnet|460015455636266|220.206.178.49|220.206.178.77||20169|53|0||agentchannel.api.duapp.com (A)(Host address)
18656536338|21764|21672|351858054419270|912|2014-01-09 21:45:56.0024280|2014-01-09 21:45:56.0040020|1|67|131|198|2|10.14.16.231|58.242.2.2|0||3gnet|460015455636266|220.206.178.49|220.206.178.77||23615|53|0||monitor.uu.qq.com (A)(Host address)
18656536338|21764|21672|351858054419270|912|2014-01-09 21:45:59.7191570|2014-01-09 21:45:59.7207180|1|62|110|172|2|10.14.16.231|58.242.2.2|0||3gnet|460015455636266|220.206.178.49|220.206.178.77||52284|53|0||oc.umeng.com (A)(Host address)
18656536338|21764|21672|351858054419270|912|2014-01-09 21:46:01.2988520|2014-01-09 21:46:01.3002600|1|64|108|172|2|10.14.16.231|58.242.2.2|0||3gnet|460015455636266|220.206.178.49|220.206.178.77||39821|53|0||hmma.baidu.com (A)(Host address)
18656536338|21764|21672|351858054419270|912|2014-01-09 21:46:01.3182010|2014-01-09 21:46:01.3204560|1|69|180|249|2|10.14.16.231|58.242.2.2|0||3gnet|460015455636266|220.206.178.49|220.206.178.77||21044|53|0||wireless.mapbar.com (A)(Host address)
[root@Zhengwu_merge 18656536338_2014010922]#

论坛徽章:
39
辰龙
日期:2013-08-21 15:45:192015亚冠之广州富力
日期:2015-05-12 16:34:52亥猪
日期:2015-03-03 17:22:00申猴
日期:2015-03-03 17:21:37未羊
日期:2014-10-10 13:45:41戌狗
日期:2014-06-17 09:53:29巨蟹座
日期:2014-06-12 23:17:17双鱼座
日期:2014-06-10 12:42:44寅虎
日期:2014-06-09 12:52:172015亚冠之卡尔希纳萨夫
日期:2015-05-24 15:24:35黄金圣斗士
日期:2015-12-02 17:25:0815-16赛季CBA联赛之吉林
日期:2017-06-24 16:43:52
2 [报告]
发表于 2014-01-11 01:11 |只看该作者
回复 1# yuloveban


目测一下:
  1. awk 'FNR==NR{a[$8]=1;next}a[$14]'  18656536338_logcdr_092145.txt  18656536338_fdr_2145.txt
复制代码

论坛徽章:
13
双鱼座
日期:2013-10-23 09:30:05数据库技术版块每日发帖之星
日期:2016-04-20 06:20:00程序设计版块每日发帖之星
日期:2016-03-09 06:20:002015亚冠之塔什干火车头
日期:2015-11-02 10:07:452015亚冠之德黑兰石油
日期:2015-08-30 10:07:07数据库技术版块每日发帖之星
日期:2015-08-28 06:20:00数据库技术版块每日发帖之星
日期:2015-08-05 06:20:002015年迎新春徽章
日期:2015-03-04 09:57:09辰龙
日期:2014-12-03 14:45:52酉鸡
日期:2014-07-23 09:46:23亥猪
日期:2014-03-13 08:46:22金牛座
日期:2014-02-11 09:36:21
3 [报告]
发表于 2014-01-11 07:49 |只看该作者
目测楼上目测的对
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP