- 论坛徽章:
- 33
|
- #!/usr/bin/env python
- # -*- coding: utf-8 -*-
- ## set the file encoding as utf-8
- import sys
- reload(sys)
- sys.setdefaultencoding('utf-8')
- import urllib2
- import re
- def getInfoFromPage(pattern,page):
- p = re.compile(pattern,re.UNICODE)
- result = p.findall(page)
- print result
- return result
- pat = "手机版"#任何网页上的中文
- url = 'http://detail.1688.com/offer/42776544.html'
- page = urllib2.urlopen(url)
- data = page.read().decode('gbk').encode('utf-8')
- print data
- result = getInfoFromPage(pat,data)
复制代码 改成这样就行了. 原来页面的编码是GBK的.
回复 9# howema
|
|