免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2004 | 回复: 7
打印 上一主题 下一主题

[文本处理] 求一个shell中一段的编写 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2012-10-04 01:17 |只看该作者 |倒序浏览
  1. <tr>
  2. <td class="rowfollow nowrap" valign="middle" style='padding: 0px'><a href="?cat=402"><img class="c_tvseries" src="pic/cattrans.gif" alt="电视剧(TV Series)" title="电视剧(TV Series)" style="background-image: url(pic/category/chd/nanosofts/catsprites.png);" /></a><img class="si_riph264" src="pic/cattrans.gif" style="background-image: url(pic/category/chd/nanosofts/additional/addsprites.gif);" alt="Encode/x.264" title="Encode/x.264" /></td>
  3. <td class="rowfollow" width="100%" align="left"><table class="torrentname" width="100%"><tr><td class="embedded"><a title="The Amazing Race S21E01 720p WEB-DL AAC2.0 H.264-KiNGS"  href="details.php?id=120814&hit=1"><b>The Amazing Race S21E01 720p WEB-DL AAC2.0 H.264-KiNGS</b></a><br />极速前进( 第二十一季 第1集) [iTunes版] <img class="pro_free" src="pic/trans.gif" alt="Free"  onmouseover="domTT_activate(this, event, 'content', '<b><font class="free">免费</font></b>限时: <b><span title="2012-10-04 02:39:47">5时45分</span></b>', 'trail', false, 'delay',500,'lifetime',3000,'fade','both','styleClass','niceTitle', 'fadeMax',87, 'maxWidth', 300);" /> <span style="color:#3C4954">(限时: 5时45分)</span></td><td width="60" class="embedded" style="text-align: right; " valign="middle"><a href="download.php?id=120814"><img class="download" src="pic/trans.gif" style="margin: 0 2px 3px 0;" alt="download" title="下载本种" /></a><a id="rssdown32"  href="javascript: rssdown(120814,32);" ><img style="margin: 0 2px 3px 0;" class="rssdown" src="pic/trans.gif" alt="Rss Down" title="RSS下载" /></a><a id="bookmark32"  href="javascript: bookmark(120814,32);" ><img style="margin: 0 2px 3px 0;" class="delbookmark" src="pic/trans.gif" alt="Unbookmarked" title="收藏" /></a><br /><a href="http://www.imdb.com/title/tt0285335"><span style="padding-top: 2px;font-size:7pt;color:#3C4954;">IMDb: 7.6</span></a></td>
  4. </tr></table></td><td class="rowfollow"><a href="comment.php?action=add&pid=120814&type=torrent" title="添加评论">0</a></td><td class="rowfollow nowrap"><span title="2012-10-03 20:39:47">14分</span></td><td class="rowfollow">1.34<br />GB</td><td class="rowfollow" align="center"><b><a href="details.php?id=120814&hit=1&dllist=1#seeders"><font color="#bb0000">1</font></a></b></td>
  5. <td class="rowfollow"><b><a href="details.php?id=120814&hit=1&dllist=1#leechers">9</a></b></td>
  6. <td class="rowfollow">0</td>
  7. <td class="rowfollow"><span class="nowrap"><a  href="userdetails.php?id=83776" class='ExtremeUser_Name'><b>feidaoshou</b></a></span></td>
  8. </tr>
  9. <tr>
  10. <td class="rowfollow nowrap" valign="middle" style='padding: 0px'><a href="?cat=405"><img class="c_anime" src="pic/cattrans.gif" alt="动画/动漫(Animations)" title="动画/动漫(Animations)" style="background-image: url(pic/category/chd/nanosofts/catsprites.png);" /></a><img class="si_bdh264" src="pic/cattrans.gif" style="background-image: url(pic/category/chd/nanosofts/additional/addsprites.gif);" alt="Blu-ray/H.264" title="Blu-ray/H.264" /></td>
  11. <td class="rowfollow" width="100%" align="left"><table class="torrentname" width="100%"><tr><td class="embedded"><a title="Fullmetal Alchemist Brotherhood BDrip 1920x1080 Vol 01-Vol 16 Fin x264 AAC-FN@Lv 1"  href="details.php?id=120813&hit=1"><b>Fullmetal Alchemist Brotherhood BDrip 1920x1080 Vol 01-Vol 16 Fin x264 AAC-FN@Lv 1</b></a><br />钢之炼金术师2009 <img class="pro_50pctdown" src="pic/trans.gif" alt="50%" title="50%" /></td><td width="60" class="embedded" style="text-align: right; " valign="middle"><a href="download.php?id=120813"><img class="download" src="pic/trans.gif" style="margin: 0 2px 3px 0;" alt="download" title="下载本种" /></a><a id="rssdown33"  href="javascript: rssdown(120813,33);" ><img style="margin: 0 2px 3px 0;" class="rssdown" src="pic/trans.gif" alt="Rss Down" title="RSS下载" /></a><a id="bookmark33"  href="javascript: bookmark(120813,33);" ><img style="margin: 0 2px 3px 0;" class="delbookmark" src="pic/trans.gif" alt="Unbookmarked" title="收藏" /></a></td>
  12. </tr></table></td><td class="rowfollow"><a href="comment.php?action=add&pid=120813&type=torrent" title="添加评论">0</a></td><td class="rowfollow nowrap"><span title="2012-10-03 20:29:21">25分</span></td><td class="rowfollow">50.55<br />GB</td><td class="rowfollow" align="center"><b><a href="details.php?id=120813&hit=1&dllist=1#seeders">1</a></b></td>
  13. <td class="rowfollow"><b><a href="details.php?id=120813&hit=1&dllist=1#leechers">2</a></b></td>
  14. <td class="rowfollow">0</td>
  15. <td class="rowfollow"><span class="nowrap"><a  href="userdetails.php?id=116047" class='User_Name'><b>YUEchan</b></a></span></td>
  16. </tr>
  17. <tr>
  18. <td class="rowfollow nowrap" valign="middle" style='padding: 0px'><a href="?cat=410"><img class="c_pad" src="pic/cattrans.gif" alt="iPad影视(iP/iPad)" title="iPad影视(iP/iPad)" style="background-image: url(pic/category/chd/nanosofts/catsprites.png);" /></a><img class="si_remuxh264" src="pic/cattrans.gif" style="background-image: url(pic/category/chd/nanosofts/additional/addsprites.gif);" alt="Remux/H.264" title="Remux/H.264" /></td>
  19. <td class="rowfollow" width="100%" align="left"><table class="torrentname" width="100%"><tr><td class="embedded"><a title="And God Created Woman 1956 BluRay 720p iPad AAC x264-CHDPAD"  href="details.php?id=120812&hit=1"><b>And God Created Woman 1956 BluRay 720p iPad AAC x264-CHDPAD</b></a><br />上帝创造女人[原盘制作 德语]</td><td width="60" class="embedded" style="text-align: right; " valign="middle"><a href="download.php?id=120812"><img class="download" src="pic/trans.gif" style="margin: 0 2px 3px 0;" alt="download" title="下载本种" /></a><a id="rssdown34"  href="javascript: rssdown(120812,34);" ><img style="margin: 0 2px 3px 0;" class="rssdown" src="pic/trans.gif" alt="Rss Down" title="RSS下载" /></a><a id="bookmark34"  href="javascript: bookmark(120812,34);" ><img style="margin: 0 2px 3px 0;" class="delbookmark" src="pic/trans.gif" alt="Unbookmarked" title="收藏" /></a></td>
  20. </tr></table></td><td class="rowfollow"><b><a href="details.php?id=120812&hit=1&cmtpage=1#startcomments" onmouseover="domTT_activate(this, event, 'content', document.getElementById('lastcom_34'), 'trail', false, 'delay', 500,'lifetime',3000,'fade','both','styleClass','niceTitle','fadeMax', 87,'maxWidth', 400);">1</a></b></td><td class="rowfollow nowrap"><span title="2012-10-03 20:18:01">36分</span></td><td class="rowfollow">1.40<br />GB</td><td class="rowfollow" align="center"><b><a href="details.php?id=120812&hit=1&dllist=1#seeders"><font color="#ff0000">1</font></a></b></td>
  21. <td class="rowfollow"><b><a href="details.php?id=120812&hit=1&dllist=1#leechers">61</a></b></td>
  22. <td class="rowfollow">0</td>
  23. <td class="rowfollow"><span class="nowrap"><a  href="userdetails.php?id=23554" class='User_Name'><b>wgymyz520</b></a></span></td>
  24. </tr>
复制代码
假如我有一个php文件文件内容如上,我现在想过提取一个url后缀的ID
条件:
每个<tr>……<tr/>段查看有没有<img class="pro_free" src="pic/trans.gif" alt="Free"
如果有,则把此段中的<a title="The Amazing Race S21E01 720p WEB-DL AAC2.0 H.264-KiNGS" href="details.php?id=120814&hit=1"> 120814提取出来并生成一个wget的下载连接 (wget -O The Amazing Race S21E01 720p WEB-DL AAC2.0 H.264-KiNGS.torrent http://www.123.com/download.php?id=120814)
下载连接中的urlID是变量  保存的种子名也是变量了



求大神帮忙,小白shell只会最最简单的,这样高级的想实现但是不会

论坛徽章:
1
辰龙
日期:2014-05-22 11:38:58
2 [报告]
发表于 2012-10-04 09:30 |只看该作者
本帖最后由 winway1988 于 2012-10-04 14:26 编辑
  1. #! /bin/bash
  2. #

  3. sed -nr ':a;\,<tr>.*</tr>,!{N;$!ba};:b;s#<tr>(.*<tr>)#<TR>\1#;tb;h;s#.*(<tr>.*</tr>).*#\1#;\,<tr>.*<img class="pro_free" src="pic/trans.gif" alt="Free".*</tr>,s#<tr>.*<a title="([^"]*)" *href="details\.php\?id=([0-9]*)\&hit=1">.*</tr>#\1:\2#p;g;s#<tr>.*</tr>##;s#<TR>#<tr>#g;$!ba' $1 | { OIFS="$IFS";IFS=:';while read title id;do echo "wget -O ${title}.torrent http://www.123.com/download.php?id=${id}";done;IFS="$OIFS"; }
复制代码

  1. $ cat test.sh
  2. #! /bin/bash
  3. #

  4. sed -nr ':a;\,<tr>.*</tr>,!{N;$!ba};:b;s#<tr>(.*<tr>)#<TR>\1#;tb;h;s#.*(<tr>.*</tr>).*#\1#;\,<tr>.*<img class="pro_free" src="pic/trans.gif" alt="Free".*</tr>,s#<tr>.*<a title="([^"]*)" *href="details\.php\?id=([0-9]*)\&hit=1">.*</tr>#\1:\2#p;g;s#<tr>.*</tr>##;s#<TR>#<tr>#g;$!ba' $1 | { OIFS="$IFS";IFS=:';while read title id;do echo "wget -O ${title}.torrent http://www.123.com/download.php?id=${id}";done;IFS="$OIFS"; }
  5. $ ./test.sh urfile
  6. wget -O The Amazing Race S21E01 720p WEB-DL AAC2.0 H.264-KiNGS.torrent http://www.123.com/download.php?id=120814
复制代码

论坛徽章:
0
3 [报告]
发表于 2012-10-04 12:26 |只看该作者
回复 2# winway1988

谢谢了   但是我测试运行是错误的
  1. 123.sh: line 8: unexpected EOF while looking for matching `''
  2. 123.sh: line 9: syntax error: unexpected end of file
复制代码
提示的错误两行就是您给我的那个sed段  line9 没用  总共就8行
   

论坛徽章:
2
射手座
日期:2014-10-10 15:59:4715-16赛季CBA联赛之上海
日期:2016-03-03 10:27:14
4 [报告]
发表于 2012-10-04 13:58 |只看该作者
回复 3# xiaoyawl
  1. awk '{s=s $0}END{for(i=1;i<=split(s,a,/<tr>|<\/tr>/);i++)if(a[i]~/<img class="pro_free" src="pic\/trans.gif" alt="Free"/){print gensub(/^<[^<]+<a title="([^"]+)" +href="details.php\?(id=[^&]+).*/,"wget -O \\1.torrent http://www.123.com/download.php?\\2",1,a[i])}}' file
  2. wget -O The Amazing Race S21E01 720p WEB-DL AAC2.0 H.264-KiNGS.torrent http://www.123.com/download.php?id=120814

复制代码

论坛徽章:
1
2015年辞旧岁徽章
日期:2015-03-03 16:54:15
5 [报告]
发表于 2012-10-07 23:04 |只看该作者
  1. $ awk 'BEGIN{RS="";FS="<[/]tr>"}{for(i=1;i<=NF;i++)if($i~/<img class="pro_free" src="pic\/trans.gif" alt="Free"/)print gensub(/.*\<a title="([^"]+)"[^>]+id=([0-9]+).*/,"wget -O \\1.torrent http://www.123.com/download.php?id=\\2","G",$i)}' log.txt
  2. wget -O The Amazing Race S21E01 720p WEB-DL AAC2.0 H.264-KiNGS.torrent http://www.123.com/download.php?id=120814
复制代码
疑惑,到时候wget如何运行?文件名当中有空格。
我觉得是不是应该用double quota引起来?

论坛徽章:
0
6 [报告]
发表于 2012-10-08 04:38 |只看该作者
回复 5# L_kernel

我也很疑惑 所以不敢回复了,因为我需要生成下载路径然后wget自动下载文件到指定位置的

   

论坛徽章:
1
2015年辞旧岁徽章
日期:2015-03-03 16:54:15
7 [报告]
发表于 2012-10-08 10:42 |只看该作者
xiaoyawl 发表于 2012-10-08 04:38
回复 5# L_kernel

我也很疑惑 所以不敢回复了,因为我需要生成下载路径然后wget自动下载文件到指定位置 ...
不敢回复倒是没有必要。互相讨论才是最重要的,加上引号就好了。

论坛徽章:
0
8 [报告]
发表于 2012-10-11 05:12 |只看该作者
回复 7# L_kernel


    嗯嗯  谢谢
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP