免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
楼主: bikkuri
打印 上一主题 下一主题

文本处理-从长文本中获取comment信息 [复制链接]

论坛徽章:
0
1 [报告]
发表于 2018-04-01 10:23 |显示全部楼层
bikkuri 发表于 2018-03-27 00:44
请问用perl应该怎么来处理这段文本呢?
用shell脚本我会写,但是用perl不太会,试了好长时间也没有得到期 ...

把你写的贴上来,让大神们看看你卡在哪里了

论坛徽章:
0
2 [报告]
发表于 2018-04-01 11:03 |显示全部楼层

@array = split/added a comment/;

print "\n\n\n";
print $array[-1];
print "\n\n\n";
print $array[-2];

$array[-1] =~ s/[\n|\r|\t|\s]+/ /g;
$array[-2] =~ s/[\n|\r|\t|\s]+/ /g;

print "\n\n\n";
print $array[-1];
print "\n\n\n";
print $array[-2];

print "\n\n\n";
print (substr($array[-1], -75));
print ("added a comment");
print (substr($array[-2], -75));

不知道你文本里的 "\u003*"是啥玩意。 你看看是不是这个意思

论坛徽章:
0
3 [报告]
发表于 2018-04-01 11:04 |显示全部楼层
/div\u003e\n\u003c\/div\u003e\n \u003c\/div\u003e\n \u003c\/div\u003e\n\"";added a comment \/\u003e\u003c\/span\u003e\u0
03c\/span\u003e Yuanqing XU\u003c\/a\u003e\n

论坛徽章:
0
4 [报告]
发表于 2018-04-01 11:20 |显示全部楼层
#!perl

$a = '...nContainer\\\"\u003e\\n        \u003cdiv class=\\\"action-head\\\"\u003e\\n            \u003ca href=\\\"#\\\" title=\\\"Expand comment\\\" class=\\\"twixi\\\"\u003e\u003cspan class=\\\"icon-default aui-icon aui-icon-small aui-iconfont-collapsed\\\"\u003e\u003cspan\u003eShow\u003c\\/span\u003e\u003c\\/span\u003e\u003c\\/a\u003e\\n            \u003cdiv class=\\\"action-details flooded\\\"\u003e\\n                        \\n    \\n    \\n    \\n                \\n\\n    \u003ca class=\\\"user-hover user-avatar\\\" rel=\\\"FMS003720\\\" id=\\\"commentauthor_2008726_concise\\\" href=\\\"\\/secure\\/ViewProfile.jspa?name=FMS003720\\\"\u003e\u003cspan class=\\\"aui-avatar aui-avatar-xsmall\\\"\u003e\u003cspan class=\\\"aui-avatar-inner\\\"\u003e\u003cimg src=\\\"https:\\/\\/greenhopper.app.alcatel-lucent.com\\/secure\\/useravatar?size=xsmall&avatarId=10312\\\" alt=\\\"FMS003720\\\" \\/\u003e\u003c\\/span\u003e\u003c\\/span\u003e CARES\u003c\\/a\u003e\\n added a comment  - \u003cspan class=\'commentdate_2008726_concise subText\'\u003e\u003cspan class=\'date user-tz\' title=\'19\\/Mar\\/18 6:54 AM\'\u003e\u003ctime class=\'livestamp\' datetime=\'2018-03-19T06:54:20+0100\'\u003e19\\/Mar\\/18 6:54 AM\u003c\\/time\u003e\u003c\\/span\u003e\u003c\\/span\u003e                    Note Content: 03\\/19\\/18 00:53:56 julee \\nText Type: Customer modification\\/inquiry \\nHi Craig, \\n\\n Do you know what is the meaning of below logs? \\nIt seems there is some issue for authorization.  \\n\\n MINOR: CLI Command not allowed for this user.  \\n\\n Regards, \\nJunho              \u003c\\/div\u003e\\n        \u003c\\/div\u003e\\n    \u003c\\/div\u003e\\n\u003c\\/div\u003e\\n                             \\n\\n\\n\u003cdiv id=\\\"comment-2010460\\\" class=\\\"issue-data-block activity-comment twixi-block  expanded\\\"\u003e\\n    \u003cdiv class=\\\"twixi-wrap verbose actionContainer\\\"\u003e\\n        \u003cdiv class=\\\"action-head\\\"\u003e\\n            \u003ca href=\\\"#\\\" title=\\\"Collapse comment\\\" class=\\\"twixi\\\"\u003e\u003cspan class=\\\"icon-default aui-icon aui-icon-small aui-iconfont-expanded\\\"\u003e\u003cspan\u003eHide\u003c\\/span\u003e\u003c\\/span\u003e\u003c\\/a\u003e\\n            \u003cdiv class=\\\"action-links\\\"\u003e\\n                \u003ca href=\\\"\\/browse\\/XRS-2497?focusedCommentId=2010460&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-2010460\\\"\\n                   title=\\\"Right click and copy link for a permanent link to this comment.\\\" class=\\\"activitymodule-link issue-comment-action\\\"\u003e\\n                    \u003cspan class=\\\"icon-default aui-icon aui-icon-small aui-iconfont-link\\\"\u003ePermalink\u003c\\/span\u003e\u003c\\/a\u003e\\n                                            \u003c\\/div\u003e\\n            \u003cdiv class=\\\"action-details\\\"\u003e        \\n    \\n    \\n    \\n                \\n\\n    \u003ca class=\\\"user-hover user-avatar\\\" rel=\\\"yuanqinx\\\" id=\\\"commentauthor_2010460_verbose\\\" href=\\\"\\/secure\\/ViewProfile.jspa?name=yuanqinx\\\"\u003e\u003cspan class=\\\"aui-avatar aui-avatar-xsmall\\\"\u003e\u003cspan class=\\\"aui-avatar-inner\\\"\u003e\u003cimg src=\\\"https:\\/\\/greenhopper.app.alcatel-lucent.com\\/secure\\/useravatar?size=xsmall&avatarId=10312\\\" alt=\\\"yuanqinx\\\" \\/\u003e\u003c\\/span\u003e\u003c\\/span\u003e Yuanqing XU\u003c\\/a\u003e\\n added a comment  - \u003cspan class=\'commentdate_2010460_verbose subText\'\u003e\u003cspan class=\'date user-tz\' title=\'19\\/Mar\\/18 4:34 PM\'\u003e\u003ctime class=\'livestamp\' datetime=\'2018-03-19T16:34:48+0100\'\u003e19\\/Mar\\/18 4:34 PM\u003c\\/time\u003e\u003c\\/span\u003e\u003c\\/span\u003e  \u003c\\/div\u003e\\n        \u003c\\/div\u003e\\n        \u003cdiv class=\\\"action-body flooded\\\"\u003e\u003cp\u003eHi Junho,\u003c\\/p\u003e\\n\\n\u003cp\u003eThis could be an indication that we didn\'t receive a response from TACACS+ server in a timely manner, or we received a reject from TACACS+ server. A PCAP may help us understand what happened.\u003c\\/p\u003e\\n\\n\u003cp\u003eBest regards,\u003cbr\\/\u003e\\nDavid Xu\u003c\\/p\u003e \u003c\\/div\u003e\\n    \u003c\\/div\u003e\\n    \u003cdiv class=\\\"twixi-wrap concise actionContainer\\\"\u003e\\n        \u003cdiv class=\\\"action-head\\\"\u003e\\n            \u003ca href=\\\"#\\\" title=\\\"Expand comment\\\" class=\\\"twixi\\\"\u003e\u003cspan class=\\\"icon-default aui-icon aui-icon-small aui-iconfont-collapsed\\\"\u003e\u003cspan\u003eShow\u003c\\/span\u003e\u003c\\/span\u003e\u003c\\/a\u003e\\n            \u003cdiv class=\\\"action-details flooded\\\"\u003e\\n                        \\n    \\n    \\n    \\n                \\n\\n    \u003ca class=\\\"user-hover user-avatar\\\" rel=\\\"yuanqinx\\\" id=\\\"commentauthor_2010460_concise\\\" href=\\\"\\/secure\\/ViewProfile.jspa?name=yuanqinx\\\"\u003e\u003cspan class=\\\"aui-avatar aui-avatar-xsmall\\\"\u003e\u003cspan class=\\\"aui-avatar-inner\\\"\u003e\u003cimg src=\\\"https:\\/\\/greenhopper.app.alcatel-lucent.com\\/secure\\/useravatar?size=xsmall&avatarId=10312\\\" alt=\\\"yuanqinx\\\" \\/\u003e\u003c\\/span\u003e\u003c\\/span\u003e Yuanqing XU\u003c\\/a\u003e\\n added a comment  - \u003cspan class=\'commentdate_2010460_concise subText\'\u003e\u003cspan class=\'date user-tz\' title=\'19\\/Mar\\/18 4:34 PM\'\u003e\u003ctime class=\'livestamp\' datetime=\'2018-03-19T16:34:48+0100\'\u003e19\\/Mar\\/18 4:34 PM\u003c\\/time\u003e\u003c\\/span\u003e\u003c\\/span\u003e                    Hi Junho, \\n\\n This could be an indication that we didn\'t receive a response from TACACS+ server in a timely manner, or we received a reject from TACACS+ server. A PCAP may help us understand what happened. \\n\\n Best regards, \\nDavid Xu              \u003c\\/div\u003e\\n        \u003c\\/div\u003e\\n    \u003c\\/div\u003e\\n\u003c\\/div\u003e\\n                             \u003c\\/div\u003e\\n    \u003c\\/div\u003e\\n\"";';


$a =~ s/\\u003[a-z]//mg;

print $a;


@array = split/added a comment/, $a;

print "\n\n\n";
print $array[-1];
print "\n\n\n";
print $array[-2];

$array[-1] =~ s/[\n|\r|\t|\s|\\n|\/div]+/ /mg;

$array[-2] =~ s/[\n|\r|\t|\s|\\n|\/div]+/ /mg;


print "\n\n\n";
print $array[-1];
print "\n\n\n";
print $array[-2];

print "\n\n\n";
print (substr($array[-1], -150));
print ("added a comment");
print (substr($array[-2], -150));

论坛徽章:
0
5 [报告]
发表于 2018-04-01 11:21 |显示全部楼层

po se from TACACS+ ser er a t mely ma er, or we rece e a reject from TACACS+ ser er. A PCAP may help us u ersta what hap
pe e . Best regar s, Da Xu "";added a comment -a atar- er " mg src= "https: gree hopper.app.alcatel-luce t.com secure us
era atar?s ze=xsmall&a atarI =10312 " alt= "yua q x " spa spa Yua q g XU a

论坛徽章:
0
6 [报告]
发表于 2018-04-02 15:27 |显示全部楼层
回复 11# bikkuri

你想要什么结果,我没怎么看明白

论坛徽章:
0
7 [报告]
发表于 2018-04-08 09:15 |显示全部楼层
#!perl

$a = '...nContainer\\\"\u003e\\n        \u003cdiv class=\\\"action-head\\\"\u003e\\n            \u003ca href=\\\"#\\\" title=\\\"Expand comment\\\" class=\\\"twixi\\\"\u003e\u003cspan class=\\\"icon-default aui-icon aui-icon-small aui-iconfont-collapsed\\\"\u003e\u003cspan\u003eShow\u003c\\/span\u003e\u003c\\/span\u003e\u003c\\/a\u003e\\n            \u003cdiv class=\\\"action-details flooded\\\"\u003e\\n                        \\n    \\n    \\n    \\n                \\n\\n    \u003ca class=\\\"user-hover user-avatar\\\" rel=\\\"FMS003720\\\" id=\\\"commentauthor_2008726_concise\\\" href=\\\"\\/secure\\/ViewProfile.jspa?name=FMS003720\\\"\u003e\u003cspan class=\\\"aui-avatar aui-avatar-xsmall\\\"\u003e\u003cspan class=\\\"aui-avatar-inner\\\"\u003e\u003cimg src=\\\"https:\\/\\/greenhopper.app.alcatel-lucent.com\\/secure\\/useravatar?size=xsmall&avatarId=10312\\\" alt=\\\"FMS003720\\\" \\/\u003e\u003c\\/span\u003e\u003c\\/span\u003e CARES\u003c\\/a\u003e\\n added a comment  - \u003cspan class=\'commentdate_2008726_concise subText\'\u003e\u003cspan class=\'date user-tz\' title=\'19\\/Mar\\/18 6:54 AM\'\u003e\u003ctime class=\'livestamp\' datetime=\'2018-03-19T06:54:20+0100\'\u003e19\\/Mar\\/18 6:54 AM\u003c\\/time\u003e\u003c\\/span\u003e\u003c\\/span\u003e                    Note Content: 03\\/19\\/18 00:53:56 julee \\nText Type: Customer modification\\/inquiry \\nHi Craig, \\n\\n Do you know what is the meaning of below logs? \\nIt seems there is some issue for authorization.  \\n\\n MINOR: CLI Command not allowed for this user.  \\n\\n Regards, \\nJunho              \u003c\\/div\u003e\\n        \u003c\\/div\u003e\\n    \u003c\\/div\u003e\\n\u003c\\/div\u003e\\n                             \\n\\n\\n\u003cdiv id=\\\"comment-2010460\\\" class=\\\"issue-data-block activity-comment twixi-block  expanded\\\"\u003e\\n    \u003cdiv class=\\\"twixi-wrap verbose actionContainer\\\"\u003e\\n        \u003cdiv class=\\\"action-head\\\"\u003e\\n            \u003ca href=\\\"#\\\" title=\\\"Collapse comment\\\" class=\\\"twixi\\\"\u003e\u003cspan class=\\\"icon-default aui-icon aui-icon-small aui-iconfont-expanded\\\"\u003e\u003cspan\u003eHide\u003c\\/span\u003e\u003c\\/span\u003e\u003c\\/a\u003e\\n            \u003cdiv class=\\\"action-links\\\"\u003e\\n                \u003ca href=\\\"\\/browse\\/XRS-2497?focusedCommentId=2010460&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-2010460\\\"\\n                   title=\\\"Right click and copy link for a permanent link to this comment.\\\" class=\\\"activitymodule-link issue-comment-action\\\"\u003e\\n                    \u003cspan class=\\\"icon-default aui-icon aui-icon-small aui-iconfont-link\\\"\u003ePermalink\u003c\\/span\u003e\u003c\\/a\u003e\\n                                            \u003c\\/div\u003e\\n            \u003cdiv class=\\\"action-details\\\"\u003e        \\n    \\n    \\n    \\n                \\n\\n    \u003ca class=\\\"user-hover user-avatar\\\" rel=\\\"yuanqinx\\\" id=\\\"commentauthor_2010460_verbose\\\" href=\\\"\\/secure\\/ViewProfile.jspa?name=yuanqinx\\\"\u003e\u003cspan class=\\\"aui-avatar aui-avatar-xsmall\\\"\u003e\u003cspan class=\\\"aui-avatar-inner\\\"\u003e\u003cimg src=\\\"https:\\/\\/greenhopper.app.alcatel-lucent.com\\/secure\\/useravatar?size=xsmall&avatarId=10312\\\" alt=\\\"yuanqinx\\\" \\/\u003e\u003c\\/span\u003e\u003c\\/span\u003e Yuanqing XU\u003c\\/a\u003e\\n added a comment  - \u003cspan class=\'commentdate_2010460_verbose subText\'\u003e\u003cspan class=\'date user-tz\' title=\'19\\/Mar\\/18 4:34 PM\'\u003e\u003ctime class=\'livestamp\' datetime=\'2018-03-19T16:34:48+0100\'\u003e19\\/Mar\\/18 4:34 PM\u003c\\/time\u003e\u003c\\/span\u003e\u003c\\/span\u003e  \u003c\\/div\u003e\\n        \u003c\\/div\u003e\\n        \u003cdiv class=\\\"action-body flooded\\\"\u003e\u003cp\u003eHi Junho,\u003c\\/p\u003e\\n\\n\u003cp\u003eThis could be an indication that we didn\'t receive a response from TACACS+ server in a timely manner, or we received a reject from TACACS+ server. A PCAP may help us understand what happened.\u003c\\/p\u003e\\n\\n\u003cp\u003eBest regards,\u003cbr\\/\u003e\\nDavid Xu\u003c\\/p\u003e \u003c\\/div\u003e\\n    \u003c\\/div\u003e\\n    \u003cdiv class=\\\"twixi-wrap concise actionContainer\\\"\u003e\\n        \u003cdiv class=\\\"action-head\\\"\u003e\\n            \u003ca href=\\\"#\\\" title=\\\"Expand comment\\\" class=\\\"twixi\\\"\u003e\u003cspan class=\\\"icon-default aui-icon aui-icon-small aui-iconfont-collapsed\\\"\u003e\u003cspan\u003eShow\u003c\\/span\u003e\u003c\\/span\u003e\u003c\\/a\u003e\\n            \u003cdiv class=\\\"action-details flooded\\\"\u003e\\n                        \\n    \\n    \\n    \\n                \\n\\n    \u003ca class=\\\"user-hover user-avatar\\\" rel=\\\"yuanqinx\\\" id=\\\"commentauthor_2010460_concise\\\" href=\\\"\\/secure\\/ViewProfile.jspa?name=yuanqinx\\\"\u003e\u003cspan class=\\\"aui-avatar aui-avatar-xsmall\\\"\u003e\u003cspan class=\\\"aui-avatar-inner\\\"\u003e\u003cimg src=\\\"https:\\/\\/greenhopper.app.alcatel-lucent.com\\/secure\\/useravatar?size=xsmall&avatarId=10312\\\" alt=\\\"yuanqinx\\\" \\/\u003e\u003c\\/span\u003e\u003c\\/span\u003e Yuanqing XU\u003c\\/a\u003e\\n added a comment  - \u003cspan class=\'commentdate_2010460_concise subText\'\u003e\u003cspan class=\'date user-tz\' title=\'19\\/Mar\\/18 4:34 PM\'\u003e\u003ctime class=\'livestamp\' datetime=\'2018-03-19T16:34:48+0100\'\u003e19\\/Mar\\/18 4:34 PM\u003c\\/time\u003e\u003c\\/span\u003e\u003c\\/span\u003e                    Hi Junho, \\n\\n This could be an indication that we didn\'t receive a response from TACACS+ server in a timely manner, or we received a reject from TACACS+ server. A PCAP may help us understand what happened. \\n\\n Best regards, \\nDavid Xu              \u003c\\/div\u003e\\n        \u003c\\/div\u003e\\n    \u003c\\/div\u003e\\n\u003c\\/div\u003e\\n                             \u003c\\/div\u003e\\n    \u003c\\/div\u003e\\n\"";';



$a =~ s/\\u([0-9a-f]{1,4})/chr(hex($1))/ge;s/\\"/"/g;s/\\\\/\\/g;s/\\\x27/\x27/g;s|\\/|/|g;s/\\n/\n/g;s/\\"/"/g;
$a =~ s/<.*?>//g;
$a =~ s/&lt;.*?&gt;//g;
print($a);

print "\n\n\n";
# print "\n\n\n";
$a =~ s/\\n+/ /g;
$a =~ s/\s+/ /g;
print($a);

@array = split/added a comment/, $a;

print "\n\n\n";
print $array[-1];
print "\n\n\n";
print $array[-2];
print "\n\n\n";
@name = split/[\s]+/, $array[-2];
print($name[-1], " ",$name[-2], " ");

$array[-1] =~ s/\\//g;
print (substr($array[-1], 0, (150 - length($name[-1]) - length($name[-2]))));

论坛徽章:
0
8 [报告]
发表于 2018-04-08 09:16 |显示全部楼层
XU Yuanqing  - 19/Mar/18 4:34 PM Hi Junho, This could be an indication that we didn't receive a response from TACACS+ se
rver in a timely manner, or we r

论坛徽章:
0
9 [报告]
发表于 2018-04-08 09:17 |显示全部楼层
主要是里面的标签对太多

论坛徽章:
0
10 [报告]
发表于 2018-04-08 16:17 |显示全部楼层
XU Yuanqing  - 19/Mar/18 4:34 PM Hi Junho, This could be an indication that we didn't receive a response from TACACS+ server in a timely manner, or we r

这不就是你的结果嘛?
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP