免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
12下一页
最近访问板块 发新帖
查看: 5144 | 回复: 15
打印 上一主题 下一主题

怎么可以去掉这些乱七八糟的字符? [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2004-04-19 21:48 |只看该作者 |倒序浏览
  1. 10018 CLOSED FIXED
  2. 1018 CLOSED FIXED
  3. 1019 CLOSED FIXED
  4. 1013 CLOSED WONTFIX Closing as WONTFIX.</PRE>
  5. 10133 CLOSED FIXED
  6. 10134 CLOSED FIXED
  7. 10135 CLOSED FIXED Closing as FIXED.</PRE>
  8. 10136 CLOSED FIXED
  9. 10138 CLOSED FIXED
  10. 10139 CLOSED FIXED
  11. 10141 CLOSED FIXED
  12. 1014 CLOSED FIXED
  13. 10143 CLOSED FIXED
  14. 10144 CLOSED FIXED
  15. 10145 CLOSED FIXED
  16. 10146 CLOSED LATER
  17. 10153 CLOSED FIXED
  18. 10198 CLOSED FIXED
  19. 107 CLOSED FIXED
  20. 1037 CLOSED FIXED
  21. 1054 CLOSED FIXED
  22. 1063 CLOSED FIXED
  23. 1065 CLOSED FIXED
  24. 1067 CLOSED FIXED
  25. 1070 CLOSED DUPLICATE
  26. 103 CLOSED WONTFIX
  27. 1030 CLOSED DUPLICATE
  28. 1034 CLOSED FIXED
  29. 10360 RESOLVED FIXED
  30. 10367 RESOLVED FIXED
  31. 10369 CLOSED FIXED
  32. 10385 CLOSED FIXED
  33. 1039 CLOSED FIXED
  34. 1041 CLOSED LATER closing as 'LATER' due to no activity - please reopen when the
  35. 1045 NEW <PRE>Set to &NEW&</PRE>
  36. 10437 CLOSED FIXED
  37. 10441 CLOSED -DNEW_SOLAR -D_USE_NAMESPACE=1 -DSTLPORT_VERSION=400 -D__DMAKE -DUN
  38. IX -DNEW_SOLAR -D_USE_NAMESPACE=1 -DSTLPORT_VERSION=400 -D__DMAKE -DUNIX -DNEW_S
  39. OLAR -D_USE_NAMESPACE=1 -DSTLPORT_VERSION=400 -D__DMAKE -DUNIX -DNEW_SOLAR -D_US
  40. E_NAMESPACE=1 -DSTLPORT_VERSION=400 -D__DMAKE -DUNIX FIXED
  41. 10463 CLOSED FIXED
  42. 10477 STARTED
  43. 1054 CLOSED FIXED
  44. 10535 CLOSED FIXED
  45. 10536 CLOSED INVALID
  46. 10564 CLOSED FIXED
  47. 10693 NEW
  48. 10740 CLOSED FIXED
  49. 10760 NEW
  50. 10788 STARTED
  51. 10789 CLOSED FIXED
  52. 10795 CLOSED WORKSFORME
  53. 10797 CLOSED FIXED
  54. 1080 CLOSED DUPLICATE
  55. 10803 CLOSED DUPLICATE
  56. 1080 CLOSED FIXED
  57. 10859 RESOLVED FIXED
  58. 10895 STARTED <PRE>Change to NEW status</PRE>
  59. 10905 CLOSED FIXED
  60. 10908 RESOLVED FIXED
  61. 10938 CLOSED WORKSFORME
  62. 10987 CLOSED WORKSFORME
  63. 11006 CLOSED FIXED
  64. 1106 CLOSED FIXED
  65. 11049 STARTED
  66. 11068 VERIFIED FIXED
  67. 11149 CLOSED FIXED
  68. 113 STARTED
  69. 1133 CLOSED FIXED
  70. 11337 CLOSED FIXED
  71. 11343 CLOSED WONTFIX
  72. 11386 STARTED
  73. 11454 CLOSED FIXED
  74. 11479 STARTED
  75. 11593 CLOSED FIXED
  76. 1163 CLOSED INVALID
  77. 11688 CLOSED DUPLICATE
  78. 11700 CLOSED FIXED
  79. 11740 RESOLVED LATER
  80. 1174 CLOSED FIXED
  81. 11755 CLOSED FIXED
  82. 11816 CLOSED WORKSFORME
  83. 1190 CLOSED DUPLICATE
  84. 1197 CLOSED FIXED
  85. 11939 CLOSED <PRE>Mark CLOSED</PRE> FIXED
  86. 11945 CLOSED FIXED
  87. 11971 CLOSED <PRE>Mark CLOSED</PRE> FIXED
  88. 1197 RESOLVED LATER issue as WONTFIX instead of reassigning back to the previous
  89. assignee
  90. 1006 CLOSED LATER
  91. 1017 NEW
  92. 1073 UNCONFIRMED
  93. 1085 NEW
  94. 109 NEW
  95. 1100 CLOSED FIXED
  96. 1163 CLOSED FIXED
  97. 1180 RESOLVED FIXED
  98. 173 CLOSED WORKSFORME
  99. 1357 CLOSED FIXED
  100. 1411 STARTED
  101. 1437 CLOSED FIXED
  102. 1455 VERIFIED FIXED
  103. 1576 CLOSED DUPLICATE
  104. 1645 CLOSED FIXED
  105. 1656 RESOLVED FIXED
  106. 1663 CLOSED FIXED
  107. 171 CLOSED FIXED <PRE>Seen okay in CWS. FIXED and corrected some bugtracking fla
  108. gs.</PRE> <PRE>JSI-&DVO: Haven't verified the new beaviour, set this bug agai
  109. n to FIXED.</PRE> Status FIXED.</PRE>
  110. 1745 CLOSED WORKSFORME
  111. 1771 STARTED
  112. 178 CLOSED FIXED
  113. 1804 CLOSED FIXED
  114. 1808 CLOSED FIXED
  115. 183 RESOLVED FIXED
  116. 1833 CLOSED FIXED
  117. 1854 NEW
  118. 1858 CLOSED WORKSFORME
  119. 1863 RESOLVED FIXED
  120. 187 CLOSED FIXED
  121. 1878 CLOSED WORKSFORME
  122. 1879 CLOSED FIXED
  123. 189 STARTED
  124. 1904 CLOSED FIXED
  125. 195 STARTED
  126. 1970 RESOLVED FIXED
  127. 197 RESOLVED FIXED
  128. 1973 RESOLVED FIXED
  129. 1975 RESOLVED FIXED
  130. 1979 CLOSED FIXED
  131. 1985 CLOSED FIXED
  132. 13007 CLOSED FIXED
  133. 13047 CLOSED WORKSFORME
  134. 13057 CLOSED INVALID
  135. 13058 CLOSED DUPLICATE
  136. 13085 CLOSED FIXED
  137. 13090 CLOSED FIXED
  138. 13097 CLOSED FIXED
  139. 13107 CLOSED FIXED
  140. 13134 NEW
  141. 13169 CLOSED FIXED
  142. 133 RESOLVED WONTFIX
  143. 138 CLOSED INVALID
  144. 1348 NEW
  145. 1356 CLOSED FIXED
  146. 1369 CLOSED DUPLICATE
  147. 138 CLOSED WORKSFORME
  148. 13365 CLOSED WONTFIX <PRE>vq-&ause: I think this is a clear WONTFIX! <PRE>I r
  149. esolve this issue as WONTFIX because to fix it, it would
  150. 13371 CLOSED FIXED
复制代码


正确的输出应该全象这个样子:(最多3列)

  1. 13371 CLOSED FIXED
  2. 13385 CLOSED FIXED
  3. 13395 STARTED
  4. 13405 CLOSED FIXED
  5. 13431 RESOLVED FIXED
  6. 13444 RESOLVED FIXED
  7. 13448 RESOLVED DUPLICATE
  8. 13464 CLOSED WORKSFORME
  9. 13491 RESOLVED WORKSFORME
  10. 13496 CLOSED FIXED
  11. 13498 CLOSED WORKSFORME
  12. 1353 CLOSED WORKSFORME
  13. 13533 STARTED
复制代码

论坛徽章:
0
2 [报告]
发表于 2004-04-20 09:16 |只看该作者

怎么可以去掉这些乱七八糟的字符?

没看出什么规律

论坛徽章:
0
3 [报告]
发表于 2004-04-20 09:17 |只看该作者
提示: 作者被禁止或删除 内容自动屏蔽

论坛徽章:
0
4 [报告]
发表于 2004-04-20 12:09 |只看该作者

怎么可以去掉这些乱七八糟的字符?

哈哈,问题解决了。换了种思路,谢谢各位替我伤脑筋。

论坛徽章:
0
5 [报告]
发表于 2004-04-21 01:13 |只看该作者

怎么可以去掉这些乱七八糟的字符?

[quote]原帖由 "labrun"]没看出什么规律[/quote 发表:

嘻嘻,真的看不出什么规律,他就解决了

论坛徽章:
0
6 [报告]
发表于 2004-04-21 01:14 |只看该作者

怎么可以去掉这些乱七八糟的字符?

[quote]原帖由 "maxx"]哈哈,问题解决了。换了种思路,谢谢各位替我伤脑筋。[/quote 发表:

嘻嘻,把你的思路说说,别小气哟

论坛徽章:
0
7 [报告]
发表于 2004-04-21 11:52 |只看该作者

怎么可以去掉这些乱七八糟的字符?

路过, 代maxx答一下

我跟maxx是同学, 遇到的问题要是在一些html文件里面找出几种对某些关键字的说明性文字来. 这些说明文字是有限的几个特定字符串. 他原来的方法好像是直接在文件中找这些说明文字, 把它们过滤出来. 在过滤的过程中, 出现了在楼上说的这个问题.

后来我们换了一种做法, 就是先在网页上找到那些关键字, 然后在关键字的后面应该是说明文字的地方把它们提出来. 这样就减少了出错的可能, 这次的问题大概算是解决了. maxx说的换了一种思路, 其实是意思上的一个歧义.

不过, 就事论事, 我觉得目前我们做得其实还是比较勉强. 举个例子: 在某html文件中有以下的code:
<TR>
        <TH><B><A HREF="/scdocs/issue_lifecycle.html" onclick="return launch(this.href, 1)" title="Note: link may open in new window" class="helplink">Status:</A></B></TH>
        <TD>RESOLVED</TD>
我们是通过找Status这个关键词, 然后在它的下一行里把RESOlVED取出来. 经过论坛的帮助, 我用了以下的code:
  1. sed -n '/>Status:</{n;p;}' $i | sed s'/<TD>\(.*\)<\/TD>/\1/'
复制代码

但是我觉得我的code其实是很勉强的. 对html文件而言, 把这一段文字分成两行固然是一种良好的编程风格, 但如果把它们写在一行里也是可以的, 并不影响他的效果.甚至在比较极端的情况下, 把他们写成
  1. <TH><B><A HREF="/scdocs/issue_lifecycle.html" onclick="return launch(this.href, 1)" title="Note: link may open in new window" class="helplink">Status:</A></B></TH>


  2. <TD>


  3. RESOLVED



  4. </TD>
复制代码

也是可以. 这样一来我的shell scripe其实就无能为力了. 想请教一下各位高手有何解决的方法.

看得出来在论坛里还有其他的同学, 不知你们的解决方法是什么?
作业已交, 并不求高分数. 在论坛里得益不少, 纯就兴趣与大家讨论. 请各位高手多多指教.

ps: 不知这算不算跑题, 或者是另开新贴比较好?

论坛徽章:
0
8 [报告]
发表于 2004-04-21 12:58 |只看该作者

怎么可以去掉这些乱七八糟的字符?

那用LYNX Download 文件, output to stdout, 你不会见到<>.. 只有
Status: 和RESOLVED. 再用sed把RESOLVED filter出来..

论坛徽章:
0
9 [报告]
发表于 2004-04-21 13:35 |只看该作者

怎么可以去掉这些乱七八糟的字符?

哪个学校在开设这门课程?

论坛徽章:
0
10 [报告]
发表于 2004-04-21 13:39 |只看该作者

怎么可以去掉这些乱七八糟的字符?

UNVERSITY OF TECHNOLOGY, SYDNEY AUSTRALIA
www.uts.edu.au
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP