- 论坛徽章:
- 18
|
本帖最后由 bikkuri 于 2017-05-18 00:38 编辑
大家好!我有一个问题向大家求助.
我有一些HTML文本,需要从中查找一些数据并生成CSV文件.
例如以下是一个样本.
<HTML style="background-color: #CCCCCC"><HEAD><TITLE>7x50 HCT</TITLE><link rel="stylesheet" type="text/css" href="css/myStyle.css?v=1.2.3"/><link rel="shortcut icon" href="favicon.ico"/><script type="text/javascript" src="js/hct.js?v=1.0"></script><script type="text/javascript" src="js/overlib.js"></script></HEAD><BODY ><div id="overDiv" style="position:absolute; visibility:hidden; z-index:1000;"></div><center><center><table class="intro" width="99%" height="98%" border=0 cellpadding=0 cellspacing=0><tr><td align=center><table border=0; cellpadding="0" cellspacing="0" id="top1" width="98%" style="border-bottom:1px solid black; margin-top: 10px;"><tr><th nowrap width="22%" style="padding-left: 10px; padding-right: 10px;" class="newproperty"><style="margin-top: 1px; margin-left: 5px"><b style="font-size: 15pt;"> Health Check Tool v5 </b></th><th id="welcome" valign="middle" align="center" style="color:#333333; font-weight: bold"><form action="logon.php" method="POST">Welcome ZHU John! | <a href="index.php" style="color: #000000">UPLOAD</a> | <input type="hidden" name="logout" value="ZHU John,,0"><input class="btn" type="submit" value="Log out" style="font-size:8pt" /></form></th></tr></table><TABLE id="content" WIDTH="98%" BORDER="0" valign ="top" style="border-left:1px solid #606060; border-right: 1px solid #606060; border-bottom: 2px solid #606060;"><TR><TD WIDTH="20%" ALIGN="LEFT" VALIGN="TOP"><table class="imagetable" border="3" align="center" cellpadding="0"><tr><th class="newproperty" colspan="3"><i> REPORT </i></th></tr><tr><td class="level_1" colspan="2">LEVEL 1</td></tr><tr><td class="darknav" >Error</td><td class="diff">Information requires further analysis by TEC</td></tr><tr><td class="darknav" >Component information</td><td class="value">IMM 7</td></tr><tr><td class="darknav" >Information</td><td class="value">TEC needs to review the TS files to determine the root cause.</td></tr><tr><td class="darknav" >Action Plan</td><td class="value">Escalate according to the severity.</td></tr><tr><td class="level_2" colspan="2">LEVEL 2</td></tr><tr><td class="darknav" >Error</td><td class="diff">Enable vprn-network-exceptions</td></tr><tr><td class="darknav" >Component information</td><td class="value">SF/CPM A[integrated]</td></tr><tr><td class="darknav" >Information</td><td class="value">General Recommendation: Implement TA 12-1435. <br>Enable vprn-network-exceptions under "config>system>security#" context.</td></tr><tr><td class="darknav" >Action Plan</td><td class="value"><a href="DOCS/7x50/TSN/TA12-1435.pdf" target="myspot">TA12-1435.pdf</a></td></tr></table></td><td width="70%" height="100%" align="center" VALIGN="TOP" rowspan="6"><iframe src="../HCT/upload/jzhu039/errorInfo.html" style="border:1px solid #CCCCCC" name="myspot" width="98%" height="100%"></iframe></td></tr><TR><TD WIDTH="20%" ALIGN="LEFT" VALIGN="TOP"><table class="imagetable" border="3" VALIGN="TOP" align="left" cellpadding="0"><tr><th class="newproperty" colspan="3"><i>ERROR DETAILS</i></th></tr><tr><td class="value" colspan="3"><a href="../HCT/upload/jzhu039/errorInfo.html" target="myspot"><img src="images/view.png"></a></td></tr></table></TD></TR><TR><TD WIDTH="20%" ALIGN="LEFT" VALIGN="TOP"><table class="imagetable" border="3" VALIGN="TOP" align="left" cellpadding="0"><tr><th class="newproperty" colspan=3><i>ANALYSIS / COMPARING TOOLS</i></th></tr><tr><td class="darknav"> </td><td class="darknav">DOWNLOAD</td><td class="darknav">DIFF</td></TR><tr><td class="value">CLI Show Output</td><td class="value"><a href="../HCT/upload/jzhu039/cliShow.zip"><img src="images/download.gif"></a></td><td class="value"><a href="diff.php?f1=../HCT/upload/jzhu039/ar1cta1gru.ts1.txt.show&f2=../HCT/upload/jzhu039/ar1cta1gru.ts2.txt.show" target="myspot"><img src="images/view.png"></a></td></tr><tr><td class="value">LOG 99 & 100</td><td class="value"><a href="../HCT/upload/jzhu039/logs.zip"><img src="images/download.gif"></a></td><td class="value"><a href="diff.php?f1=../HCT/upload/jzhu039/ar1cta1gru.ts1.txt.log&f2=../HCT/upload/jzhu039/ar1cta1gru.ts2.txt.log" target="myspot"><img src="images/view.png"></a></td></tr><tr><td class="value">CONFIG</td><td class="value"><a href="../HCT/upload/jzhu039/configs.zip"><img src="images/download.gif"></a></td><td class="value"><a href="diff.php?f1=../HCT/upload/jzhu039/ar1cta1gru.ts1.txt.cfg&f2=../HCT/upload/jzhu039/ar1cta1gru.ts2.txt.cfg" target="myspot"><img src="images/view.png"></a></td></tr><tr><td class="value">PORT STATS</td><td class="value"><a href="../HCT/upload/jzhu039/portStats.zip"><img src="images/download.gif"></a></td><td class="value"><a href="diff.php?f1=../HCT/upload/jzhu039/ar1cta1gru.ts1.txt.portstats&f2=../HCT/upload/jzhu039/ar1cta1gru.ts2.txt.portstats" target="myspot"><img src="images/view.png"></a></td></tr><tr><td class="value">CLI History</td><td class="value"><a href="../HCT/upload/jzhu039/cliHistory.zip"><img src="images/download.gif"></a></td><td class="value"><a href="diff.php?f1=../HCT/upload/jzhu039/ar1cta1gru.ts1.txt.hist&f2=../HCT/upload/jzhu039/ar1cta1gru.ts2.txt.hist" target="myspot"><img src="images/view.png"></a></td></tr><tr><td class="value">All</td><td class="value"><a href="../HCT/upload/jzhu039/All.zip"><img src="images/download.gif"></a></td></table></TD></TR><TR><TD width="20%" VALIGN="TOP"><table class="imagetable" border="3" align="center" cellpadding="0"><tr><th class="newproperty" colspan="3"><i>TIME INFO</i></th><tr><td class="darknav"></td><td class="darknav">CAPTURED</td><td class="darknav">DIFF</td></tr><tr><td class="value"> ar1cta1gru.ts1.txt </td><td class="value"> FRI MAR 24 12:10:42 2017 UTC
</td><td class="value" rowspan="2"><font color="black"> 168:00:2</font></td></tr><tr><td class="value"> ar1cta1gru.ts2.txt </td><td class="value"> FRI MAR 31 12:10:44 2017 UTC
</td></tr></table></td></tr><TR><TD width="20%" height="40%"> <BR><BR><BR><BR><BR><BR><BR><BR><BR><BR></TD></TR></table></td></FORM></tr></table></TD></TR></TABLE></BODY></HTML><HTML style="background-color: #F6EEEE"><HEAD><link rel="stylesheet" type="text/css" href="../../css/myStyle.css?v=1.2.3"/><script type="text/javascript" src="../../js/hct.js?v=1.0"></script><script type="text/javascript" src="../../js/overlib.js"></script></HEAD><BODY style="background-color: #F6EEEE"><table class="system" width="100%" border="1" align="center" valign="top" cellpadding="1" cellspacing="1" bgcolor="#CCCCCC"><tr><td class="type" width="40%"> System Name </td><td class="data"><b>ar1.cta1.gru
</b></td></tr><tr><td class="type" width="40%"> System Type </td><td class="data"><b>7750 SR-12
</b></td></tr><tr><td class="type" width="40%"> System Version </td><td class="data"><b>C-12.0.R6
</b></td></tr><tr><td class="type" width="40%"> Chassis MAC </td><td class="data"><b>e4:81:84:2d:9c:0f
</b></td></tr><tr><td class="infodata" width="40%"> TS file 1 : ar1cta1gru.ts1 </td><td class="data"><b>Information as of FRI MAR 24 12:10:42 2017 UTC
</b></td></tr><tr><td class="infodata" width="40%"> TS file 2 : ar1cta1gru.ts2 </td><td class="data"><b>Information as of FRI MAR 31 12:10:44 2017 UTC
</b></td></tr></table><table class="system" style="background-color: #E4B77B" width="100%" border="0" align="center" cellpadding="0" cellspacing="1"><tr><td style="color:#333333; font-weight: bold;" align="center" colspan="8"><i>ERRORs</i></td></tr><tr><th class="categories">Component Information</th><th class="categories">Error</th><th class="categories">Level</th><th class="categories">Description</th><th class="categories">Action Plan</th></tr><tr><td colspan="5" align="left" style="padding-left: 320px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> CHASSIS 1 </font> - [<i>Standalone</i>] - [<i>200G per slot capable</i>] - [<i>NS142861360
</i>] - [<i>2016/12/23 11:54:50
</i>]</td></tr><tr><td colspan="5" align="left" style="padding-left: 350px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> CPM A </font> - [<i>sfm4-12</i>] - [<i>sfm4-12</i>] - [<i>up</i>] - [<i>up_active</i>] - [<i>NS1424F0495
</i>] - [<i>2016/12/23 11:54:50
</i>]</td></tr><tr><td class="diff" >SF/CPM A[integrated]</td><td class="bookedl2" >Enable vprn-network-exceptions</td><td class="diff" >2</td><td class="diff" colspan=2>General Recommendation: Implement TA 12-1435. <br>Enable vprn-network-exceptions under "config>system>security#" context.</td></tr><tr><td colspan="5" align="left" style="padding-left: 350px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> CPM B </font> - [<i>sfm4-12</i>] - [<i>sfm4-12</i>] - [<i>up</i>] - [<i>up_standby</i>] - [<i>NS1425F0644
</i>] - [<i>2016/12/29 01:23:16
</i>]</td></tr><tr><td colspan="5" align="left" style="padding-left: 350px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> IMM 1 </font> - [<i>imm-2pac-fp3</i>] - [<i>imm-2pac-fp3</i>] - [<i>up</i>] - [<i>up</i>] - [<i>NS141263567
</i>] - [<i>2016/12/23 11:55:43
</i>]</td></tr><tr><td colspan="5" align="left" style="padding-left: 350px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> IMM 2 </font> - [<i>imm-2pac-fp3</i>] - [<i>imm-2pac-fp3</i>] - [<i>up</i>] - [<i>up</i>] - [<i>NS1424F1146
</i>] - [<i>2016/12/23 11:55:41
</i>]</td></tr><tr><td colspan="5" align="left" style="padding-left: 350px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> IMM 3 </font> - [<i>imm-2pac-fp3</i>] - [<i>imm-2pac-fp3</i>] - [<i>up</i>] - [<i>up</i>] - [<i>NS1424F1035
</i>] - [<i>2016/12/23 11:55:41
</i>]</td></tr><tr><td colspan="5" align="left" style="padding-left: 350px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> IOM 4 </font> - [<i>iom3-xp</i>] - [<i>iom3-xp</i>] - [<i>up</i>] - [<i>up</i>] - [<i>NS142663050
</i>] - [<i>2016/12/23 11:55:42
</i>]</td></tr><tr><td colspan="5" align="left" style="padding-left: 380px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> MDA 4/1 </font> - [<i>m20-1gb-xp-sfp</i>] - [<i>m20-1gb-xp-sfp</i>] - [<i>up</i>] - [<i>up</i>] - [<i>NS1415F0994
</i>] - [<i>2016/12/23 11:55:58
</i>]</td></tr><tr><td colspan="5" align="left" style="padding-left: 380px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> MDA 4/2 </font> - [<i>m20-1gb-xp-sfp</i>] - [<i>m20-1gb-xp-sfp</i>] - [<i>up</i>] - [<i>up</i>] - [<i>NS1415F0669
</i>] - [<i>2016/12/23 11:55:58
</i>]</td></tr><tr><td colspan="5" align="left" style="padding-left: 350px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> IMM 5 </font> - [<i>imm-2pac-fp3</i>] - [<i>imm-2pac-fp3</i>] - [<i>up</i>] - [<i>up</i>] - [<i>NS152168464
</i>] - [<i>2016/12/23 11:55:41
</i>]</td></tr><tr><td colspan="5" align="left" style="padding-left: 350px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> IMM 6 </font> - [<i>imm-2pac-fp3</i>] - [<i>imm-2pac-fp3</i>] - [<i>up</i>] - [<i>up</i>] - [<i>NS152168452
</i>] - [<i>2016/12/23 11:55:42
</i>]</td></tr><tr><td colspan="5" align="left" style="padding-left: 350px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> IMM 7 </font> - [<i>imm-2pac-fp3</i>] - [<i>imm-2pac-fp3</i>] - [<i>up</i>] - [<i>up</i>] - [<i>NS152168467
</i>] - [<i>2016/12/23 11:55:46
</i>]</td></tr><tr><td class="diff" >IMM 7</td><td class="booked" >Information requires further analysis by TEC</td><td class="diff" >1</td><td class="diff" colspan=2>TEC needs to review the TS files to determine the root cause.</td></tr><tr><td colspan="5" align="left" style="padding-left: 350px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> IMM 8 </font> - [<i>imm-2pac-fp3</i>] - [<i>imm-2pac-fp3</i>] - [<i>up</i>] - [<i>up</i>] - [<i>NS152168463
</i>] - [<i>2017/02/23 13:14:09
</i>]</td></tr><tr><td colspan="5" align="left" style="padding-left: 350px; background: #FFFFFF; color: #432F21; font-size: 8pt; font-family: arial, sans-serif;"><font style="font-weight: bold; font-size: 12pt;"> IMM 9 </font> - [<i>imm-2pac-fp3</i>] - [<i>imm-2pac-fp3</i>] - [<i>up</i>] - [<i>up</i>] - [<i>NS152168460
</i>] - [<i>2016/12/23 11:55:41
</i>]</td></tr></table>
HTML源文件中可能有很多条以LEVEL 1或者LEVEL 2或者LEVEL 3开头的错误.
希望对HTML源文件中的这种错误进行处理并输出如下的CSV文件:
"NODE","LEVEL","ERROR","COMPONENT INFORMATION","INFORMATION","ACTION PLAN"
"ar1.cta1.gru","LEVEL 1","Information requires further analysis by TEC","IMM 7","TEC needs to review the TS files to determine the root cause.","Escalate according to the severity."
"ar1.cta1.gru","LEVEL 2","Enable vprn-network-exceptions","SF/CPM A[integrated]","General Recommendation: Implement TA 12-1435. <br>Enable vprn-network-exceptions under "config>system>security#" context.","TA12-1435.pdf"
假如没有找到任何错误,则仅输出系统名称:
"NODE","LEVEL","ERROR","COMPONENT INFORMATION","INFORMATION","ACTION PLAN"
"ar1.cta1.gru"
以下是两个HTML源文件的样本:
output-1.jzhu039.txt
(12.79 KB, 下载次数: 0)
output-2.jzhu039.txt
(9.68 KB, 下载次数: 0)
以下是ar1.cta1.gru的输出内容:
ar1.cta1.gru_HCT_output.pdf
(328.66 KB, 下载次数: 5)
请问用awk命令应该如何处理这种HTML文件抓取所需的信息并输出所需的格式?
谢谢大家.
|
|