免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2441 | 回复: 4
打印 上一主题 下一主题

怎么匹配出html里的所有图片地址 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2013-01-07 01:58 |只看该作者 |倒序浏览
本帖最后由 tavisdxh 于 2013-01-07 01:58 编辑

HTML文件如下:
  1. <p style="padding-right: 0.0px;padding-left: 0.0px;padding-bottom: 0.0px;margin: 1.12em 0.0px;word-spacing: 0.0px;font: 14.0px 21.0px tahoma arial 宋体 sans-serif;text-transform: none;color: #404040;text-indent: 0.0px;padding-top: 0.0px;white-space: normal;letter-spacing: normal;background-color: #ffffff;widows: 2;orphans: 2;webkit-text-size-adjust: auto;webkit-text-stroke-width: 0.0px;"><strong><span style="color: #ff4200;"><span style="font-size: 36.0px;"><span style="font-family: microsoft yahei;">现在是活动特价.不议价.请别浪费彼此的宝贵时间.想议价的亲们请绕道</span></span></span></strong></p><p style="padding-right: 0.0px;padding-left: 0.0px;padding-bottom: 0.0px;margin: 1.12em 0.0px;word-spacing: 0.0px;font: 14.0px 21.0px tahoma arial 宋体 sans-serif;text-transform: none;color: #404040;text-indent: 0.0px;padding-top: 0.0px;white-space: normal;letter-spacing: normal;background-color: #ffffff;widows: 2;orphans: 2;webkit-text-size-adjust: auto;webkit-text-stroke-width: 0.0px;"><strong><span style="color: #ff4200;"><span style="font-size: 36.0px;"><span style="font-family: microsoft yahei;">而行谢谢合作!</span></span></span></strong></p><p style="padding-right: 0.0px;padding-left: 0.0px;padding-bottom: 0.0px;margin: 1.12em 0.0px;word-spacing: 0.0px;font: 14.0px 21.0px tahoma arial 宋体 sans-serif;text-transform: none;color: #404040;text-indent: 0.0px;padding-top: 0.0px;white-space: normal;letter-spacing: normal;background-color: #ffffff;widows: 2;orphans: 2;webkit-text-size-adjust: auto;webkit-text-stroke-width: 0.0px;"><span style="color: #000000;"><strong style="color: #ff0000;font-family: kaiti_gb2312;font-size: 48.0px;line-height: 72.0px;"><span style="background-color: #ffff00;"><span style="background-color: #cc0000;"><span style="background-color: #cccccc;">鞋子偏大一码,建议买小一码。单鞋穿41码,这款买40码即可,棉鞋在出厂的时候已经 留出垫鞋垫的位置了,所以不用特意买大一码哦</span></span></span></strong></span></p><p style="padding-right: 0.0px;padding-left: 0.0px;padding-bottom: 0.0px;margin: 1.12em 0.0px;word-spacing: 0.0px;font: 14.0px 21.0px tahoma arial 宋体 sans-serif;text-transform: none;color: #404040;text-indent: 0.0px;padding-top: 0.0px;white-space: normal;letter-spacing: normal;background-color: #ffffff;widows: 2;orphans: 2;webkit-text-size-adjust: auto;webkit-text-stroke-width: 0.0px;"><strong><span style="color: #ff4200;"><span style="font-size: 36.0px;"><span style="font-family: microsoft yahei;"></span></span></span></strong></p><p style="padding-right: 0.0px;padding-left: 0.0px;padding-bottom: 0.0px;margin: 1.12em 0.0px;word-spacing: 0.0px;font: 14.0px 21.0px tahoma arial 宋体 sans-serif;text-transform: none;color: #404040;text-indent: 0.0px;padding-top: 0.0px;white-space: normal;letter-spacing: normal;background-color: #ffffff;widows: 2;orphans: 2;webkit-text-size-adjust: auto;webkit-text-stroke-width: 0.0px;"><strong><span style="color: #ff4200;"><span style="font-size: 36.0px;"><span style="font-family: microsoft yahei;"></span></span></span></strong><span style="color: #ff6300;"><span style="font-size: 36.0px;"><span style="font-family: microsoft yahei;"><strong>小店默认:申通.中通.需要</strong></span></span></span><span style="color: #ff6300;"><span style="font-size: 36.0px;"><span style="font-family: microsoft yahei;"><strong>指定快递或者以上快递不到,请拍下留</strong></span></span></span></p><p style="padding-right: 0.0px;padding-left: 0.0px;padding-bottom: 0.0px;margin: 1.12em 0.0px;word-spacing: 0.0px;font: 14.0px 21.0px tahoma arial 宋体 sans-serif;text-transform: none;color: #404040;text-indent: 0.0px;padding-top: 0.0px;white-space: normal;letter-spacing: normal;background-color: #ffffff;widows: 2;orphans: 2;webkit-text-size-adjust: auto;webkit-text-stroke-width: 0.0px;"><span style="color: #ff6300;"><span style="font-size: 36.0px;"><span style="font-family: microsoft yahei;"></span></span></span></p><p style="padding-right: 0.0px;padding-left: 0.0px;padding-bottom: 0.0px;margin: 1.12em 0.0px;word-spacing: 0.0px;font: 14.0px 21.0px tahoma arial 宋体 sans-serif;text-transform: none;color: #404040;text-indent: 0.0px;padding-top: 0.0px;white-space: normal;letter-spacing: normal;background-color: #ffffff;widows: 2;orphans: 2;webkit-text-size-adjust: auto;webkit-text-stroke-width: 0.0px;"><span style="color: #ff6300;"><span style="font-size: 36.0px;"><span style="font-family: microsoft yahei;"><strong>言</strong></span></span></span><span style="color: #ff6300;"><span style="font-size: 36.0px;"><span style="font-family: microsoft yahei;"></span></span></span></p><p style="padding-right: 0.0px;padding-left: 0.0px;padding-bottom: 0.0px;margin: 1.12em 0.0px;word-spacing: 0.0px;font: 14.0px 21.0px tahoma arial 宋体 sans-serif;text-transform: none;color: #404040;text-indent: 0.0px;padding-top: 0.0px;white-space: normal;letter-spacing: normal;background-color: #ffffff;widows: 2;orphans: 2;webkit-text-size-adjust: auto;webkit-text-stroke-width: 0.0px;"><span style="color: #ff6300;"><span style="font-size: 36.0px;"><span style="font-family: microsoft yahei;"><span style="color: #ff0000;"><strong>,</strong></span></span></span></span></p><p style="padding-right: 0.0px;padding-left: 0.0px;padding-bottom: 0.0px;margin: 1.12em 0.0px;word-spacing: 0.0px;font: 14.0px 21.0px tahoma arial 宋体 sans-serif;text-transform: none;color: #404040;text-indent: 0.0px;padding-top: 0.0px;white-space: normal;letter-spacing: normal;background-color: #ffffff;widows: 2;orphans: 2;webkit-text-size-adjust: auto;webkit-text-stroke-width: 0.0px;"><span style="color: #ff6300;"><span style="font-size: 36.0px;"><span style="font-family: microsoft yahei;"><strong>早买早便宜!买到就赚到哦 !下手要快哦哈</strong></span></span></span></p><p style="padding-right: 0.0px;padding-left: 0.0px;padding-bottom: 0.0px;margin: 1.12em 0.0px;word-spacing: 0.0px;font: 14.0px 21.0px tahoma arial 宋体 sans-serif;text-transform: none;color: #404040;text-indent: 0.0px;padding-top: 0.0px;white-space: normal;letter-spacing: normal;background-color: #ffffff;widows: 2;orphans: 2;webkit-text-size-adjust: auto;webkit-text-stroke-width: 0.0px;">&nbsp;</p><p><strong style="color: #ff6300;font-family: microsoft yahei;font-size: 36.363636px;line-height: 49.090908px;"><font color="#ffffff" style="background-color: #ff0000;">下面这款是冬款棉鞋加绒版200&darr;&darr;&darr;</font></strong><strong style="color: #ff6300;font-family: microsoft yahei;font-size: 36.363636px;line-height: 49.090908px;"><font color="#ffffff" style="background-color: #ff0000;">&darr;&darr;</font></strong><img align="absmiddle" src="http://img04.sinacdn.com/imgextra/i4/182406504/T2vqq.XdJXXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img02.sinacdn.com/imgextra/i2/182406504/T23rO_XbFaXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img03.sinacdn.com/imgextra/i3/182406504/T2r9K_XihXXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img02.sinacdn.com/imgextra/i2/182406504/T2kYS_XixaXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img04.sinacdn.com/imgextra/i4/182406504/T2GGi_XgBaXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img01.sinacdn.com/imgextra/i1/182406504/T22Ey9XoVaXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img01.sinacdn.com/imgextra/i1/182406504/T2QUO_XiNXXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img01.sinacdn.com/imgextra/i1/182406504/T2jBa_XnJXXXXXXXXX_!!182406504.jpg"><img align="absmiddle" style="font-size: 12.0px;line-height: 1.5;" src="http://img04.sinacdn.com/imgextra/i4/182406504/T2v7OZXjxXXXXXXXXX_!!182406504.jpg"><img align="absmiddle" style="font-size: 12.0px;line-height: 1.5;" src="http://img02.sinacdn.com/imgextra/i2/182406504/T2CXOSXm4aXXXXXXXX_!!182406504.jpg"><img align="absmiddle" style="font-size: 12.0px;line-height: 1.5;" src="http://img03.sinacdn.com/imgextra/i3/182406504/T2xZyEXlXbXXXXXXXX_!!182406504.jpg"><img align="absmiddle" style="font-size: 12.0px;line-height: 1.5;" src="http://img03.sinacdn.com/imgextra/i3/182406504/T2hKW0XXBXXXXXXXXX_!!182406504.jpg"><img align="absmiddle" style="font-size: 12.0px;line-height: 1.5;" src="http://img02.sinacdn.com/imgextra/i2/182406504/T2rbO0Xi8XXXXXXXXX_!!182406504.jpg"><img align="absmiddle" style="font-size: 12.0px;line-height: 1.5;" src="http://img02.sinacdn.com/imgextra/i2/182406504/T2I8mWXi8aXXXXXXXX_!!182406504.jpg"><img align="absmiddle" style="font-size: 12.0px;line-height: 1.5;" src="http://img04.sinacdn.com/imgextra/i4/182406504/T2azeAXaJcXXXXXXXX_!!182406504.jpg"><img align="absmiddle" style="font-size: 12.0px;line-height: 1.5;" src="http://img02.sinacdn.com/imgextra/i2/182406504/T2bG5ZXdpXXXXXXXXX_!!182406504.jpg"><img align="absmiddle" style="font-size: 12.0px;line-height: 1.5;" src="http://img04.sinacdn.com/imgextra/i4/182406504/T22AV3Xh0dXXXXXXXX_!!182406504.jpg"><img align="absmiddle" style="font-size: 12.0px;line-height: 1.5;" src="http://img01.sinacdn.com/imgextra/i1/182406504/T26niPXlpaXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img01.sinacdn.com/imgextra/i1/182406504/T2cyK6Xf8XXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img03.sinacdn.com/imgextra/i3/182406504/T2QAmRXjtaXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img04.sinacdn.com/imgextra/i4/182406504/T25C95XgNaXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img01.sinacdn.com/imgextra/i1/182406504/T2cTuAXXxcXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img01.sinacdn.com/imgextra/i1/182406504/T2W5dGXftNXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img03.sinacdn.com/imgextra/i3/182406504/T2d0tZXgFdXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img02.sinacdn.com/imgextra/i2/182406504/T2yHCsXnhbXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img04.sinacdn.com/imgextra/i4/182406504/T29Ue1XnFaXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img03.sinacdn.com/imgextra/i3/182406504/T2aGW6XedaXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img01.sinacdn.com/imgextra/i1/182406504/T2EVBcXhFOXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img01.sinacdn.com/imgextra/i1/182406504/T2XWlZXcBdXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img04.sinacdn.com/imgextra/i4/182406504/T234G6XlxXXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img02.sinacdn.com/imgextra/i2/182406504/T2YQeEXnFbXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img04.sinacdn.com/imgextra/i4/182406504/T2cC16XcxXXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img04.sinacdn.com/imgextra/i4/182406504/T2qUeQXjdXXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img04.sinacdn.com/imgextra/i4/182406504/T2Us1JXeBcXXXXXXXX_!!182406504.jpg"><img align="absmiddle" src="http://img03.sinacdn.com/imgextra/i3/182406504/T2wgy6XbdaXXXXXXXX_!!182406504.jpg"></p><p>&nbsp;</p><div style="display: none;"><img style="display: none;" src="http://count.superboss.cc/JcxxCollectionRequest?appId=19&tbId=182406504&activityId=73471"></div><p><a name="superbossMealCount_bot_1_end_73471"></a><a name="superbossMealCount_bot_1_start_46041"></a></p><p>&nbsp;</p><div style="display: none;"><img style="display: none;" src="http://count.superboss.cc/JcxxCollectionRequest?appId=19&tbId=182406504&activityId=46041"></div><p><a name="superbossMealCount_bot_1_end_46041"></a></p><p><a name="recommend_bot_1_start_433014"></a></p><p><a name="recommend_bot_1_end_433014"></a></p>';
复制代码
需求:查找里面所有的JPG,并保存到队列
我使用
  1. imgURLs=re.compile(r'"src="http://(.*?).jpg"').search(albumData).group()
复制代码
匹配出来的只是一个
而且还会匹配出
  1. "http://item.taobao.com/item.htm?id=15040347289&source=superboss&appId=19"><img border="0" height="240" width="240" src="http://img04.taobaocdn.com/bao/uploaded/i4/T1ELIaXcXbXXaRtOQ9_104043.jpg_310x310.jpg"
复制代码
这个怎么处理?

论坛徽章:
0
2 [报告]
发表于 2013-01-07 10:45 |只看该作者
  1. p = re.compile('src="http://.*?\.jpg')
  2. for r in  p.findall(s):
  3.     print r
复制代码
回复 1# tavisdxh


   

论坛徽章:
0
3 [报告]
发表于 2013-01-09 11:27 |只看该作者
本帖最后由 crifan 于 2013-01-09 11:28 编辑

简答:
找多个,用findall
找单个,用search

详解:
早就写好教程了:
【整理】Python中的re.search和re.findall之间的区别和联系 + re.finall中带命名的组,不带命名的组,非捕获的组,没有分组四种类型之间的区别

你这里,如果想要找src后面的图片地址,可以用:
  1. (?<=src=")https?://[/\w-.]\.((jpg)|(bmp)|(png))(?=")
复制代码
如此,就可以匹配:
1.http或https开头的
2.后面内容中,运行 字母数字下划线,-,.,等,如果运行其他字符,自己添加进入即可
3.
对于
  1. src="http://img04.taobaocdn.com/bao/uploaded/i4/T1ELIaXcXbXXaRtOQ9_104043.jpg_310x310.jpg"
复制代码
可以找到:
  1. http://img04.taobaocdn.com/bao/uploaded/i4/T1ELIaXcXbXXaRtOQ9_104043.jpg_310x310.jpg
复制代码
而不是:
  1. http://img04.taobaocdn.com/bao/uploaded/i4/T1ELIaXcXbXXaRtOQ9_104043.jpg
复制代码
4.bmp,jpg,png等

相关教程:
【教程】详解Python正则表达式
中的:
【教程】详解Python正则表达式之: (?<=…) positive lookbehind assertion 后向匹配 /后向断言
【教程】详解Python正则表达式之: (?=…) lookahead assertion 前向匹配 /前向断言

论坛徽章:
0
4 [报告]
发表于 2013-01-09 17:22 |只看该作者
回复 3# crifan


    非常感谢。学习。

论坛徽章:
0
5 [报告]
发表于 2013-01-09 17:25 |只看该作者
回复 2# 106033177


    感谢!:wink:
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP