- 论坛徽章:
- 14
|
本帖最后由 reb00t 于 2015-08-31 23:12 编辑
- import sys
- from lxml import etree
- reload(sys)
- sys.setdefaultencoding("utf8")
- import requests
- r = requests.get('http://best.pconline.com.cn/')
- html = r.text
- xmlhtml = etree.HTML(html)
- content = xmlhtml.xpath('//div[starts-with(@id,"topic")]/div[1]/a[2]/text()')
- urllist = xmlhtml.xpath('//div[starts-with(@id,"topic")]/div[1]/a[2]/@href')
- lastime = xmlhtml.xpath('//div[starts-with(@id,"topic")]/div[2]/div[2]/span[2]/text()')
- data_text = [ text for text in content ]
- data_url = [ url for url in urllist ]
- data_time = [ t.strip() for t in lastime ]
- for i in xrange(0, len(data_text), 1):
- print "%s, %s, %s" % (data_text[i], data_url[i], data_time[i])
复制代码 结果:
2岁啦!聚超值2周年庆之线下沙龙活动 , http://best.pconline.com.cn/youhui/157456.html, 08-27 11:52
LifeVC小芙 足底按摩器-揉捏型, http://best.pconline.com.cn/youhui/159049.html, 08-31 22:51
Thomas Friends托马斯和朋友之 宝宝的第一个托马斯BCX71, http://best.pconline.com.cn/youhui/159048.html, 08-31 22:00
LifeVC丽芙家居 碳钢 双层 洗衣机置物架, http://best.pconline.com.cn/youhui/159047.html, 08-31 21:16
...
... |
|