免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
楼主: yu34po
打印 上一主题 下一主题

[文本处理] 怎么把多个文件按格式合并并生成json格式? [复制链接]

论坛徽章:
32
处女座
日期:2013-11-20 23:41:20双子座
日期:2014-06-11 17:20:43戌狗
日期:2014-06-16 11:05:00处女座
日期:2014-07-22 17:30:47狮子座
日期:2014-07-28 15:38:17金牛座
日期:2014-08-05 16:34:01亥猪
日期:2014-08-18 13:34:25白羊座
日期:2014-09-02 15:03:55金牛座
日期:2014-11-10 10:23:58处女座
日期:2014-12-02 09:17:52程序设计版块每日发帖之星
日期:2015-06-16 22:20:002015亚冠之塔什干火车头
日期:2015-06-20 23:28:22
21 [报告]
发表于 2013-12-12 11:21 |只看该作者
我看了你那个压缩包里的文本,type后面的数据是以逗号+空格分隔的,而不是单独的逗号,所以会出错。我想知道你原文本中冒号后面跟的是制表符还是多个空格?

论坛徽章:
32
处女座
日期:2013-11-20 23:41:20双子座
日期:2014-06-11 17:20:43戌狗
日期:2014-06-16 11:05:00处女座
日期:2014-07-22 17:30:47狮子座
日期:2014-07-28 15:38:17金牛座
日期:2014-08-05 16:34:01亥猪
日期:2014-08-18 13:34:25白羊座
日期:2014-09-02 15:03:55金牛座
日期:2014-11-10 10:23:58处女座
日期:2014-12-02 09:17:52程序设计版块每日发帖之星
日期:2015-06-16 22:20:002015亚冠之塔什干火车头
日期:2015-06-20 23:28:22
22 [报告]
发表于 2013-12-12 11:24 |只看该作者
改成这样应该可以了:
  1. awk -F '[: ]+' '/^i/{t=$2}/^ti/{a[t]+=$2}/^ty/{for(i=1;++i<=NF;)if(!f[t,$i]++)b[t]=b[t]?b[t]$i:$i}/^[0-9]/{c[$1]+=$2}END{for(i in a)printf "id:\t%s\ntimes:\t%d\ntype:\t%s\n\n",i,a[i],b[i];for(i in c)printf "%s\t%d\n",i,c[i]}'
复制代码

论坛徽章:
0
23 [报告]
发表于 2013-12-12 11:32 |只看该作者
回复 22# yestreenstars


    嗯,这个咋生成那个json格式啊。。
id:             c9f204
times:  8472773
type:           forecast5d,observe      observe,forecast5d

id:             7d34f2
times:  107
type:           index,observe,forecast3d        observe,
中间是制表符吧?好像后边的逗号也加进去了。
61.4.184.90     5591422 0
222.186.25.29   1       0
61.4.184.91     5808807 0
还有为啥这后边都多一个0?

论坛徽章:
32
处女座
日期:2013-11-20 23:41:20双子座
日期:2014-06-11 17:20:43戌狗
日期:2014-06-16 11:05:00处女座
日期:2014-07-22 17:30:47狮子座
日期:2014-07-28 15:38:17金牛座
日期:2014-08-05 16:34:01亥猪
日期:2014-08-18 13:34:25白羊座
日期:2014-09-02 15:03:55金牛座
日期:2014-11-10 10:23:58处女座
日期:2014-12-02 09:17:52程序设计版块每日发帖之星
日期:2015-06-16 22:20:002015亚冠之塔什干火车头
日期:2015-06-20 23:28:22
24 [报告]
发表于 2013-12-12 12:19 |只看该作者
本帖最后由 yestreenstars 于 2013-12-12 17:51 编辑

回复 23# yu34po
我针对分隔符为制表符修改了一下:
  1. awk -F '\t' '/^i/{t=$2}/^ti/{a[t]+=$2}/^ty/{l=split($2,f,", ");for(i=0;++i<=l;)if(!g[t,f[i]]++)b[t]=b[t]?b[t]","f[i]:f[i]}/^[0-9]/{c[$1]+=$2}END{for(i in a)printf "id:\t%s\ntimes:\t%d\ntype:\t%s\n\n",i,a[i],b[i];for(i in c)printf "%s\t%d\n",i,c[i]}'
复制代码
下面是执行结果和测试文件: 结果和测试文件.rar (1.95 KB, 下载次数: 13)

   

论坛徽章:
32
处女座
日期:2013-11-20 23:41:20双子座
日期:2014-06-11 17:20:43戌狗
日期:2014-06-16 11:05:00处女座
日期:2014-07-22 17:30:47狮子座
日期:2014-07-28 15:38:17金牛座
日期:2014-08-05 16:34:01亥猪
日期:2014-08-18 13:34:25白羊座
日期:2014-09-02 15:03:55金牛座
日期:2014-11-10 10:23:58处女座
日期:2014-12-02 09:17:52程序设计版块每日发帖之星
日期:2015-06-16 22:20:002015亚冠之塔什干火车头
日期:2015-06-20 23:28:22
25 [报告]
发表于 2013-12-12 13:21 |只看该作者
根据result.txt转换的json格式:
  1. [root@localhost ~]# awk -F '\t' '/^i/{t="[\""$2"\""}/^ti/{t=t","$2}/^ty/{gsub(/^|$/,"\"",$2);gsub(/,/,"\",\"",$2);t=t","$2"]"}/^$/{s=s?s","t:t}/^[0-9]/{s=s",[\""$1"\","$2"]"}END{print "{"s"}"}' result.txt
  2. {["589a93",1,"observe"],["d30b9f",793989,"forecast5d","observe","index","alarm"],["a1b42a",4,"forecast5d","observe","forecast3h"],["16c070",2063,"forecast1d"],["3d274c",185571,"observe"],["674c66",716,"forecast","index","observe","alarm","air"],["5b9529",43258,"index","observe","forecast3d","null"],["7f9b82",5501,"observe","index","forecast3d"],["f63d32",27917886,"forecast","index","observe","temp","forecast3h","all"],["aa2f1a",2294,"forecast5d"],["7c1429",12555279,"index","forecast5d","observe"],["45029b",1424,"forecast1d"],["d7d2bc",79828,"observe"],["cf2d61",5,"observe","forecast3d","index"],["1f7f03",112503,"index","observe","forecast3d"],["334cfc",5868,"index","forecast3d","alarm"],["8db47c",3999,"forecast","observe"],["320723",556525,"forecast","index","observe","alarm"],["7d34f2",107,"observe","forecast3d","index"],["abb2c8",705,"forecast3d"],["e294c2",344290,"observe","alarm"],["d229f3",7,"forecast5d"],["e9b455",147434,"index","forecast5d","observe","newsl7","newsw7","alarm"],["22c100",18,"air"],["c9f204",8472773,"observe","forecast5d"],["68faa3",648700,"forecast","HBgqxx","forecast3h","alarm","HBxzxxz"],["549d81",4294875,"forecast","index","observe","alarm","calendar"],["c8d6bf",55786,"index","forecast4d","observe","indexen","forecast1d","air"],["5aae5b",257,"index","observe","forecast3d"],["71f520",3846,"index","forecast3d","air"],["2389f4",483317,"observe","index","forecast3d","alarm"],["d5881d",256753,"observe","air"],["7b0712",1,"forecast"],["f1c3f8",36,"index","observe","forecast3d"],["61.4.184.81",11420145],["61.4.184.90",11389674],["61.4.184.82",34287],["61.4.184.91",11395812],["61.4.184.92",11419722],["61.4.184.83",34285],["61.4.184.93",11416059],["198.143.149.145",2],["222.186.58.141",1],["222.186.34.74",6]}
  3. [root@localhost ~]#
复制代码

论坛徽章:
60
20周年集字徽章-20	
日期:2020-10-28 14:04:3015-16赛季CBA联赛之北京
日期:2016-07-06 15:42:0715-16赛季CBA联赛之同曦
日期:2016-06-12 10:38:0915-16赛季CBA联赛之佛山
日期:2016-05-27 11:54:56黄金圣斗士
日期:2015-12-02 11:44:35白银圣斗士
日期:2015-11-25 14:32:43白银圣斗士
日期:2015-11-23 12:53:352015亚冠之布里斯班狮吼
日期:2015-10-21 16:55:482015亚冠之首尔
日期:2015-09-01 16:46:052015亚冠之德黑兰石油
日期:2015-08-31 11:39:192015亚冠之萨济拖拉机
日期:2015-08-28 21:06:5315-16赛季CBA联赛之广东
日期:2016-07-12 14:58:53
26 [报告]
发表于 2013-12-12 14:29 |只看该作者
  1. [root@centos6-1 txt]#
  2. [root@centos6-1 txt]# ls
  3. 1.txt  2.txt  test.sh
  4. [root@centos6-1 txt]#
  5. [root@centos6-1 txt]# sh test.sh
  6. {["e294c2","344290","alarm","observe"],["1f7f03","112503","index","forecast3d","observe"],["d30b9f","793989","alarm","index","forecast5d","observe"],["a1b42a","4","forecast5d","forecast3h","observe"],["68faa3","648700","alarm","forecast3h","HBxzxxz","forecast","HBgqxx"],["320723","556525","alarm","index","forecast","observe"],["45029b","1424","forecast1d"],["16c070","2063","forecast1d"],["7c1429","12555279","index","forecast5d","observe"],["5aae5b","257","index","forecast3d","observe"],["c8d6bf","55786","index","forecast4d","air","indexen","forecast1d","observe"],["d5881d","256753","air","observe"],["c9f204","8472773","forecast5d","observe"],["71f520","3846","index","forecast3d","air"],["334cfc","5868","alarm","index","forecast3d"],["2389f4","483317","alarm","index","forecast3d","observe"],["f63d32","27917886","index","all","forecast3h","temp","forecast","observe"],["549d81","4294875","alarm","index","calendar","forecast","observe"],["cf2d61","5","index","forecast3d","observe"],["3d274c","185571","observe"],["674c66","716","alarm","index","air","forecast","observe"],["f1c3f8","36","index","forecast3d","observe"],["aa2f1a","2294","forecast5d"],["d7d2bc","79828","observe"],["d229f3","7","forecast5d"],["7d34f2","107","index","forecast3d","observe"],["e9b455","147434","alarm","index","forecast5d","newsw7","newsl7","observe"],["22c100","18","air"],["abb2c8","705","forecast3d"],["7f9b82","5501","index","forecast3d","observe"],["589a93","1","observe"],["7b0712","1","forecast"],["8db47c","3999","forecast","observe"],["5b9529","43258","index","null","forecast3d","observe"],["61.4.184.91","5587005"],["61.4.184.82","17114"],["61.4.184.90","5591422"],["61.4.184.81","5596914"],["61.4.184.83","17132"],["61.4.184.92","5609534"],["61.4.184.93","5601377"],["61.4.184.81","5823231"],["61.4.184.83","17153"],["61.4.184.93","5814682"],["61.4.184.91","5808807"],["61.4.184.82","17173"],["61.4.184.90","5798252"],["222.186.34.74","6"],["222.186.58.141","1"],["61.4.184.92","5810188"],["198.143.149.145","2"]}
  7. [root@centos6-1 txt]#
  8. [root@centos6-1 txt]#
  9. [root@centos6-1 txt]# cat test.sh
  10. #!/bin/bash

  11. cat *.txt | awk '/^[0-9]+\./{print > "ipdata";next}!/^$/{print >"indexdata"}'

  12. awk '$1=="id:"{
  13.         s=$2;
  14.         while(++i<=2){;
  15.                 getline l;
  16.                 gsub(/^[^ ]+ /,"",l);
  17.                 s=s" "l;
  18.         }
  19.         print gensub(/, /,",","g",s);
  20.         i=0;
  21. }' indexdata | awk '{ a[$1]+=$2;b[$1]=b[$1]?b[$1]","$3:$3 } END {
  22.         for(i in a){
  23.                 lens=split(b[i],tp,",");
  24.                 for(j=1;j<=lens;j++)tps[tp[j]];
  25.                 for(k in tps)vs=vs?vs","k:k;
  26.                 print i,a[i],vs;
  27.                 vs="";
  28.                 delete tps;
  29.         }}' > index

  30. cat index ipdata | sed 's/[ \t]\+/,/g;s/[A-Za-z0-9.]\+/"&"/g;s/.*/[&]/g' | paste -s -d ',' | sed 's/.*/{&}/g'
  31. rm -f index ipdata indexdata
  32. [root@centos6-1 txt]#
  33. [root@centos6-1 txt]#

复制代码
不知道是不是你要的结果,你试试看。

论坛徽章:
0
27 [报告]
发表于 2013-12-12 15:31 |只看该作者
回复 26# reyleon


    我去,我这儿的文件好像传错了,我复制粘贴到windows下再做成的txt文件,实际的文件在linux下用cat -A 1.txt,
linux下的
windows下的
。好像测试出来的不对。。appid的times值都是0啊。

论坛徽章:
32
处女座
日期:2013-11-20 23:41:20双子座
日期:2014-06-11 17:20:43戌狗
日期:2014-06-16 11:05:00处女座
日期:2014-07-22 17:30:47狮子座
日期:2014-07-28 15:38:17金牛座
日期:2014-08-05 16:34:01亥猪
日期:2014-08-18 13:34:25白羊座
日期:2014-09-02 15:03:55金牛座
日期:2014-11-10 10:23:58处女座
日期:2014-12-02 09:17:52程序设计版块每日发帖之星
日期:2015-06-16 22:20:002015亚冠之塔什干火车头
日期:2015-06-20 23:28:22
28 [报告]
发表于 2013-12-12 16:15 |只看该作者
回复 27# yu34po
我测试时已将你的给的数据转化为unix格式了,有用我给的命令测试吗?
  1. [root@localhost ~]# cat -A 1.txt | head
  2. id:^Iabb2c8$
  3. times:^I365$
  4. type:^Iforecast3d$
  5. id:^I68faa3$
  6. times:^I337375$
  7. type:^Iforecast, HBgqxx, forecast3h, alarm, HBxzxxz$
  8. id:^I5aae5b$
  9. times:^I122$
  10. type:^Iindex, observe, forecast3d$
  11. id:^I5b9529$
  12. [root@localhost ~]#
复制代码

论坛徽章:
0
29 [报告]
发表于 2013-12-12 16:18 |只看该作者
回复 28# yestreenstars


    开始用的你的,好像不行,然后用的那个人的,结果是可以的。但是unix格式的就不行了。

论坛徽章:
32
处女座
日期:2013-11-20 23:41:20双子座
日期:2014-06-11 17:20:43戌狗
日期:2014-06-16 11:05:00处女座
日期:2014-07-22 17:30:47狮子座
日期:2014-07-28 15:38:17金牛座
日期:2014-08-05 16:34:01亥猪
日期:2014-08-18 13:34:25白羊座
日期:2014-09-02 15:03:55金牛座
日期:2014-11-10 10:23:58处女座
日期:2014-12-02 09:17:52程序设计版块每日发帖之星
日期:2015-06-16 22:20:002015亚冠之塔什干火车头
日期:2015-06-20 23:28:22
30 [报告]
发表于 2013-12-12 16:23 |只看该作者
回复 29# yu34po
有用24楼的测试过?

   
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP