smalltom30 发表于 2014-09-21 22:15

py爬虫基础问题请教

怎么按 “零基础写py爬虫” http://www.2cto.com/kf/201210/159597.html
这个操作会报这个错呢?

>>> import urllib2
>>> request = urllib2.Request(url="www.baidu.com")
>>> result = urllib2.urlopen(request).read()

Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
    result = urllib2.urlopen(request).read()
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 396, in open
    protocol = req.get_type()
File "C:\Python27\lib\urllib2.py", line 258, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: www.baidu.com
>>> request = urllib2.Request(url="http://www.baidu.com")
>>> result = urllib2.urlopen(request).read()

Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
    result = urllib2.urlopen(request).read()
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 404, in open
    response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 422, in _open
    '_open', req)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
    result = func(*args)
File "C:\Python27\lib\urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
File "C:\Python27\lib\urllib2.py", line 1184, in do_open
    raise URLError(err)
URLError: <urlopen error >

请高手支招,这个跟网络有关系吗,我打开百度没问题啊

smalltom30 发表于 2014-09-21 22:34

看了下我的网络设置,居然开了一个代理,不可思议,唉
下面这个在公司环境下应该有用,公司是得经过代理上网的
http://stackoverflow.com/questions/5620263/using-an-http-proxy-python

smalltom30 发表于 2014-09-21 22:37

这个例子也有用:

from urllib import urlopen
from time import ctime
ticks = ('YHOO','EBAY','AMZN')

import csv
URL = 'http://quote.yahoo.com/d/quotes.csv?s=%s&f=sl1c1p2'

print '\nRrices qouted as of:',ctime()
print '\nticker'.ljust(9),'PRICE'.ljust(8),'CHG'.ljust(5),'%AGE'
print '----'.ljust(8),'----'.ljust(8),'----'.ljust(5),'----'
u = urlopen('http://quote.yahoo.com/d/quotes.csv?s=GOOG,EBAY&f=sl1c1p2')

for row in u:
      tick,price,chg,per=row.split(',')
      print eval(tick).ljust(7), \
            ('%.2f' % round(float(price),2)).rjust(7), \
            chg.rjust(6),eval(per.rstrip()).rjust(6)
u.close()


"""
u = urlopen(URL % ','.join(ticks))
for row in csv.DictReader(u):
      print row
f.close()

import csv
u = urlopen ('http://download.finance.yahoo.com/d/quotes.csv?s=GOOG&f=sl1d1t1c1ohgv')
for row in u:
      print row
%s %s % 'abc' 'cde'
round(float(price),2))


"""

smalltom30 发表于 2014-09-22 11:56

http://blog.sina.com.cn/s/blog_7ed3ed3d010146tl.html
这个不错
在公司代理环境 也可以得到stock数据

smalltom30 发表于 2014-09-22 12:06

本帖最后由 smalltom30 于 2014-09-22 12:06 编辑

中文问题及noteapp编辑问题:http://www.cnblogs.com/rollenholt/archive/2011/08/01/2123889.html

smalltom30 发表于 2014-09-22 15:30

http://jingyan.baidu.com/article/a3aad71aac81e0b1fa009677.html
Excel批量自动删除空白行超简单方法

smalltom30 发表于 2014-09-22 16:29

print '----'.ljust(8),'----'.ljust(8),'----'.ljust(5),'----'

smalltom30 发表于 2014-09-22 16:30

笑脸其实上是表示 8)

圣西罗门柱 发表于 2014-09-22 22:46

url="http://www.baidu.com"

smalltom30 发表于 2014-09-23 15:42

http://www.jb51.net/article/44070.htm
python paramiko实现ssh远程访问的方法,讲解得比较到位
页: [1] 2
查看完整版本: py爬虫基础问题请教