免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 4462 | 回复: 8

[文本处理] CSV文件单列输出 [复制链接]

论坛徽章:
18
辰龙
日期:2014-05-21 21:01:4115-16赛季CBA联赛之深圳
日期:2016-12-23 13:51:3815-16赛季CBA联赛之北控
日期:2016-11-28 18:26:3815-16赛季CBA联赛之佛山
日期:2016-11-03 11:18:5815-16赛季CBA联赛之辽宁
日期:2016-07-10 16:09:4115-16赛季CBA联赛之江苏
日期:2016-02-20 23:09:202015亚冠之塔什干棉农
日期:2015-08-17 19:49:492015年亚洲杯之日本
日期:2015-04-30 01:24:342015年亚洲杯之约旦
日期:2015-04-01 00:37:182015年亚洲杯之沙特阿拉伯
日期:2015-03-02 15:55:40处女座
日期:2014-05-25 10:34:0020周年集字徽章-年
日期:2023-04-23 11:17:52
发表于 2020-10-10 02:46 |显示全部楼层
本帖最后由 bikkuri 于 2020-10-10 02:49 编辑

大家好,我有一个问题向大家请教。
有这样一个csv文件样本。
  1. "Support Ticket Number","Subject","Country","Account Name","Contact Name","Date/Time Opened","Assigned Date/Time","Age (Hours)","Support Ticket Owner","Reason for Internal Status","Status","Milestone Status","Assessed Priority","Severity","Product Name","Product Variant Name","Related Cases Reference Number"
  2. "00232281","Why there is PmFileNotificationMissing alarm on NSP? How to clean it permanently","Myanmar","OOREDOO MYANMAR LIMITED","Sareth Mak","2/7/2020 11:36 AM","2/7/2020 11:50 AM","1446.0000000000","Bozu Marugao","Bozu Marugao","Completed","","Medium","Major","NSP (Network Services Platform)","NFM-P (Network Functions Manager - Packet)","NSPD-251444"
  3. "00237425","We have an issue February 11, 2020 measurement missing at G2102 & G 2103 (CMG file) around 00 - 01 AM, we try to backup manually data file to NSP but cannot generate and result blank (as attached). Is it any other work around to recovery this issue?","Indonesia","PT HUTCHISON 3 INDONESIA","Amin Susmanto","2/12/2020 10:20 AM","2/12/2020 11:01 AM","1330.0000000000","Bozu Marugao","Bozu Marugao","Completed","Closed Violation","Low","Major","NSP (Network Services Platform)","NFM-P (Network Functions Manager - Packet)",""
复制代码


希望做以下处理:
1. 第一行内容舍弃;
2. 从第二行起将每一项引号中的内容不包含两头的引号逐行输出,如果内容为空,则输出空行;
例如对样本处理希望得到的输出如下:
00232281
Why there is PmFileNotificationMissing alarm on NSP? How to clean it permanently
Myanmar
OOREDOO MYANMAR LIMITED
Sareth Mak
2/7/2020 11:36 AM
2/7/2020 11:50 AM
1446.0000000000
Bozu Marugao
Bozu Marugao
Completed

Medium
Major
NSP (Network Services Platform)
NFM-P (Network Functions Manager - Packet)
NSPD-251444
00237425
We have an issue February 11, 2020 measurement missing at G2102 & G 2103 (CMG file) around 00 - 01 AM, we try to backup manually data file to NSP but cannot generate and result blank (as attached). Is it any other work around to recovery this issue?
Indonesia
PT HUTCHISON 3 INDONESIA
Amin Susmanto
2/12/2020 10:20 AM
2/12/2020 11:01 AM
1330.0000000000
Bozu Marugao
Bozu Marugao
Completed
Closed Violation
Low
Major
NSP (Network Services Platform)
NFM-P (Network Functions Manager - Packet)


谢谢大家!

论坛徽章:
60
20周年集字徽章-20	
日期:2020-10-28 14:04:3015-16赛季CBA联赛之北京
日期:2016-07-06 15:42:0715-16赛季CBA联赛之同曦
日期:2016-06-12 10:38:0915-16赛季CBA联赛之佛山
日期:2016-05-27 11:54:56黄金圣斗士
日期:2015-12-02 11:44:35白银圣斗士
日期:2015-11-25 14:32:43白银圣斗士
日期:2015-11-23 12:53:352015亚冠之布里斯班狮吼
日期:2015-10-21 16:55:482015亚冠之首尔
日期:2015-09-01 16:46:052015亚冠之德黑兰石油
日期:2015-08-31 11:39:192015亚冠之萨济拖拉机
日期:2015-08-28 21:06:5315-16赛季CBA联赛之广东
日期:2016-07-12 14:58:53
发表于 2020-10-10 09:33 |显示全部楼层
  1. awk -F, 'NR>1 {for(i=1;i<=NF;i++)print substr($i,2,length($i)-2)}'
复制代码

论坛徽章:
18
辰龙
日期:2014-05-21 21:01:4115-16赛季CBA联赛之深圳
日期:2016-12-23 13:51:3815-16赛季CBA联赛之北控
日期:2016-11-28 18:26:3815-16赛季CBA联赛之佛山
日期:2016-11-03 11:18:5815-16赛季CBA联赛之辽宁
日期:2016-07-10 16:09:4115-16赛季CBA联赛之江苏
日期:2016-02-20 23:09:202015亚冠之塔什干棉农
日期:2015-08-17 19:49:492015年亚洲杯之日本
日期:2015-04-30 01:24:342015年亚洲杯之约旦
日期:2015-04-01 00:37:182015年亚洲杯之沙特阿拉伯
日期:2015-03-02 15:55:40处女座
日期:2014-05-25 10:34:0020周年集字徽章-年
日期:2023-04-23 11:17:52
发表于 2020-10-10 09:57 |显示全部楼层
本帖最后由 bikkuri 于 2020-10-10 10:27 编辑

回复 2# reyleon


谢谢您的帮助。
不过不能直接用逗号做分隔符,因为在引号中间的内容可能含有逗号。
应该用","做分隔符,不然下面这本来是一行的内容就会被分割成三行了。
We have an issue February 11, 2020 measurement missing at G2102 & G 2103 (CMG file) around 00 - 01 AM, we try to backup manually data file to NSP but cannot generate and result blank (as attached). Is it any other work around to recovery this issue?

我改成awk -F'","' 'NR>1 {for(i=1;i<=NF;i++){if(i==1){$1=substr($1,2,length($1)-1)};if(i==NF){$NF=substr($NF,1,length($NF)-1)};print $i}}' $sourcefile > $targetfile以后就可以正常输出了。

论坛徽章:
8
2016科比退役纪念章
日期:2018-10-24 08:24:0115-16赛季CBA联赛之北控
日期:2019-03-12 14:34:562016科比退役纪念章
日期:2019-04-01 10:33:0915-16赛季CBA联赛之山东
日期:2019-04-17 12:46:3215-16赛季CBA联赛之广夏
日期:2019-05-09 16:40:4015-16赛季CBA联赛之广夏
日期:2019-10-10 15:33:4015-16赛季CBA联赛之辽宁
日期:2019-10-15 08:37:0615-16赛季CBA联赛之北控
日期:2021-03-30 15:53:34
发表于 2020-10-10 13:35 |显示全部楼层
awk 楼主会了,就不多说了哈。
我碰到csv,现在还是更喜欢用相应python的csv库处理,不用考虑每项有没有引号,不需要自己截引号。


  1. import csv

  2. with open("1.csv") as f:
  3.     csvReader =  csv.reader(f)
  4.     headers = next(csvReader)
  5.     with open("results.txt","w") as fw:
  6.         for row in csvReader:
  7.             fw.write("\n".join(row))
  8.             fw.write("\n")
复制代码

论坛徽章:
60
20周年集字徽章-20	
日期:2020-10-28 14:04:3015-16赛季CBA联赛之北京
日期:2016-07-06 15:42:0715-16赛季CBA联赛之同曦
日期:2016-06-12 10:38:0915-16赛季CBA联赛之佛山
日期:2016-05-27 11:54:56黄金圣斗士
日期:2015-12-02 11:44:35白银圣斗士
日期:2015-11-25 14:32:43白银圣斗士
日期:2015-11-23 12:53:352015亚冠之布里斯班狮吼
日期:2015-10-21 16:55:482015亚冠之首尔
日期:2015-09-01 16:46:052015亚冠之德黑兰石油
日期:2015-08-31 11:39:192015亚冠之萨济拖拉机
日期:2015-08-28 21:06:5315-16赛季CBA联赛之广东
日期:2016-07-12 14:58:53
发表于 2020-10-13 09:47 |显示全部楼层
回复 3# bikkuri


那就改一下,不要用 -F,

用 FPAT 这个内置变量,指定字段的内容,需 gawk 4.0+ 版本



[root@hk tmp]# cat file1
"Support Ticket Number","Subject","Country","Account Name","Contact Name","Date/Time Opened","Assigned Date/Time","Age (Hours)","Support Ticket Owner","Reason for Internal Status","Status","Milestone Status","Assessed Priority","Severity","Product Name","Product Variant Name","Related Cases Reference Number"
"00232281","Why there is PmFileNotificationMissing alarm on NSP? How to clean it permanently","Myanmar","OOREDOO MYANMAR LIMITED","Sareth Mak","2/7/2020 11:36 AM","2/7/2020 11:50 AM","1446.0000000000","Bozu Marugao","Bozu Marugao","Completed","","Medium","Major","NSP (Network Services Platform)","NFM-P (Network Functions Manager - Packet)","NSPD-251444"
"00237425","We have an issue February 11, 2020 measurement missing at G2102 & G 2103 (CMG file) around 00 - 01 AM, we try to backup manually data file to NSP but cannot generate and result blank (as attached). Is it any other work around to recovery this issue?","Indonesia","PT HUTCHISON 3 INDONESIA","Amin Susmanto","2/12/2020 10:20 AM","2/12/2020 11:01 AM","1330.0000000000","Bozu Marugao","Bozu Marugao","Completed","Closed Violation","Low","Major","NSP (Network Services Platform)","NFM-P (Network Functions Manager - Packet)",""
[root@hk tmp]#
[root@hk tmp]#
[root@hk tmp]# awk 'BEGIN{FPAT="\"[^\"]*\""}NR>1{for(i=1;i<=NF;i++)print substr($i,2,length($i)-2) }' file1
00232281
Why there is PmFileNotificationMissing alarm on NSP? How to clean it permanently
Myanmar
OOREDOO MYANMAR LIMITED
Sareth Mak
2/7/2020 11:36 AM
2/7/2020 11:50 AM
1446.0000000000
Bozu Marugao
Bozu Marugao
Completed

Medium
Major
NSP (Network Services Platform)
NFM-P (Network Functions Manager - Packet)
NSPD-251444
00237425
We have an issue February 11, 2020 measurement missing at G2102 & G 2103 (CMG file) around 00 - 01 AM, we try to backup manually data file to NSP but cannot generate and result blank (as attached). Is it any other work around to recovery this issue?
Indonesia
PT HUTCHISON 3 INDONESIA
Amin Susmanto
2/12/2020 10:20 AM
2/12/2020 11:01 AM
1330.0000000000
Bozu Marugao
Bozu Marugao
Completed
Closed Violation
Low
Major
NSP (Network Services Platform)
NFM-P (Network Functions Manager - Packet)

[root@hk tmp]#

论坛徽章:
1
19周年集字徽章-年
日期:2020-10-29 09:39:21
发表于 2020-10-13 10:04 |显示全部楼层
perl -nE'$.>1&&say$1while/"([^"]*)"/g' file

论坛徽章:
60
20周年集字徽章-20	
日期:2020-10-28 14:04:3015-16赛季CBA联赛之北京
日期:2016-07-06 15:42:0715-16赛季CBA联赛之同曦
日期:2016-06-12 10:38:0915-16赛季CBA联赛之佛山
日期:2016-05-27 11:54:56黄金圣斗士
日期:2015-12-02 11:44:35白银圣斗士
日期:2015-11-25 14:32:43白银圣斗士
日期:2015-11-23 12:53:352015亚冠之布里斯班狮吼
日期:2015-10-21 16:55:482015亚冠之首尔
日期:2015-09-01 16:46:052015亚冠之德黑兰石油
日期:2015-08-31 11:39:192015亚冠之萨济拖拉机
日期:2015-08-28 21:06:5315-16赛季CBA联赛之广东
日期:2016-07-12 14:58:53
发表于 2020-10-13 14:21 |显示全部楼层
回复 6# legs

这个命令好帅呀

论坛徽章:
1
19周年集字徽章-年
日期:2020-10-29 09:39:21
发表于 2020-10-14 13:44 |显示全部楼层
回复 7# reyleon

六神好

论坛徽章:
59
2015七夕节徽章
日期:2015-08-24 11:17:25ChinaUnix专家徽章
日期:2015-07-20 09:19:30每周论坛发贴之星
日期:2015-07-20 09:19:42ChinaUnix元老
日期:2015-07-20 11:04:38荣誉版主
日期:2015-07-20 11:05:19巳蛇
日期:2015-07-20 11:05:26CU十二周年纪念徽章
日期:2015-07-20 11:05:27IT运维版块每日发帖之星
日期:2015-07-20 11:05:34操作系统版块每日发帖之星
日期:2015-07-20 11:05:36程序设计版块每日发帖之星
日期:2015-07-20 11:05:40数据库技术版块每日发帖之星
日期:2015-07-20 11:05:432015年辞旧岁徽章
日期:2015-07-20 11:05:44
发表于 2020-10-28 12:10 |显示全部楼层
如果使用python,perl肯定是比直接用Shell要方便。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP