- 论坛徽章:
- 1
|
用python写了个提取网页的小程序,代码如下:- # encoding:UTF-8
- import sys
- import re
- from urllib2 import Request, urlopen, URLError, HTTPError
- def get_packet(url):
- packet = urlopen(url)
- content = packet.read()
- return content
- def get_data(packet):
- xiangmu = '~'
- tmp = re.search(r'<a href=.*',packet)
- if tmp is not None:
- xiangmu = tmp.group().strip()
- print xiangmu
- if __name__=='__main__':
- url = 'http://stock.finance.qq.com/corp1/cbsheet.php?zqdm=600787&type=2014'
- packet = get_packet(url)
- if packet =='~':
- sys.exit(0)
- get_data(packet)
复制代码 输出只打印了一部分含有<a href=的文本,还有些没有打印出来,如何把全部都打印出来? |
|