- 论坛徽章:
- 0
|
我试着抓取自己的人人网好友状态,根据http://www.pinkyway.info/2010/12/19/fetch-webpage-by-python/这个文章,然后发现编译错误
我的环境是pydev+eclipse+ubuntu10.10- # -*- coding: utf-8 -*-
- from BeautifulSoup import BeautifulSoup
- import urllib,urllib2,cookielib
- myCookie = urllib2.HTTPCookieProcessor(cookielib.CookieJar());
- opener = urllib2.build_opener(myCookie)
- src = urllib.urlopen('http://www.baidu.com').read()
- post_data = {
- 'email':'xxxx@gmail.com',
- 'password':'xxxxx',
- 'origURL':'http://www.renren.com/home',
- 'domain':'renren.com'
- }
- req = urllib2.Request('http://www.renren.com/PLogin.do', urllib.urlencode(post_data))
- html_src = opener.open(req).read()
- parser = BeautifulSoup(html_src)
- article_list = parser.find('div','feed-list').findAll('article')
- for my_article in article_list:
- state=[] #这里提示错误,显示Lexical error,encounted "\ua00a0" after""
- for my_tag in my_article.h3.contents:
- factor = my_tag.string
- if factor != None:
- factor = factor.replace(u'\xa0','')
- factor = factor.strip(u'\r\n')
- factor = factor.strip(u'\n')
- state.append(factor)
- print ' '.join(state)
复制代码 |
|