py爬虫基础问题请教

smalltom30 发表于 2014-09-21 22:15

怎么按 “零基础写py爬虫” http://www.2cto.com/kf/201210/159597.html
这个操作会报这个错呢？

>>> import urllib2
>>> request = urllib2.Request(url="www.baidu.com")
>>> result = urllib2.urlopen(request).read()

Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
result = urllib2.urlopen(request).read()
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 396, in open
protocol = req.get_type()
File "C:\Python27\lib\urllib2.py", line 258, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: www.baidu.com
>>> request = urllib2.Request(url="http://www.baidu.com")
>>> result = urllib2.urlopen(request).read()

Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
result = urllib2.urlopen(request).read()
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 404, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 422, in _open
'_open', req)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\Python27\lib\urllib2.py", line 1184, in do_open
raise URLError(err)
URLError: <urlopen error >

请高手支招，这个跟网络有关系吗，我打开百度没问题啊

smalltom30 发表于 2014-09-21 22:34

看了下我的网络设置，居然开了一个代理，不可思议，唉
下面这个在公司环境下应该有用，公司是得经过代理上网的
http://stackoverflow.com/questions/5620263/using-an-http-proxy-python

smalltom30 发表于 2014-09-21 22:37

这个例子也有用：

from urllib import urlopen
from time import ctime
ticks = ('YHOO','EBAY','AMZN')

import csv
URL = 'http://quote.yahoo.com/d/quotes.csv?s=%s&f=sl1c1p2'

print '\nRrices qouted as of:',ctime()
print '\nticker'.ljust(9),'PRICE'.ljust(8),'CHG'.ljust(5),'%AGE'
print '----'.ljust(8),'----'.ljust(8),'----'.ljust(5),'----'
u = urlopen('http://quote.yahoo.com/d/quotes.csv?s=GOOG,EBAY&f=sl1c1p2')

for row in u:
   tick,price,chg,per=row.split(',')
   print eval(tick).ljust(7), \
         ('%.2f' % round(float(price),2)).rjust(7), \
         chg.rjust(6),eval(per.rstrip()).rjust(6)
u.close()

"""
u = urlopen(URL % ','.join(ticks))
for row in csv.DictReader(u):
   print row
f.close()

import csv
u = urlopen ('http://download.finance.yahoo.com/d/quotes.csv?s=GOOG&f=sl1d1t1c1ohgv')
for row in u:
   print row
%s %s % 'abc' 'cde'
round(float(price),2))

"""

smalltom30 发表于 2014-09-22 11:56

http://blog.sina.com.cn/s/blog_7ed3ed3d010146tl.html
这个不错
在公司代理环境也可以得到stock数据

smalltom30 发表于 2014-09-22 12:06

本帖最后由 smalltom30 于 2014-09-22 12:06 编辑

中文问题及noteapp编辑问题：http://www.cnblogs.com/rollenholt/archive/2011/08/01/2123889.html

smalltom30 发表于 2014-09-22 15:30

http://jingyan.baidu.com/article/a3aad71aac81e0b1fa009677.html
Excel批量自动删除空白行超简单方法

smalltom30 发表于 2014-09-22 16:29

print '----'.ljust(8),'----'.ljust(8),'----'.ljust(5),'----'

smalltom30 发表于 2014-09-22 16:30

笑脸其实上是表示 8）

圣西罗门柱 发表于 2014-09-22 22:46

url="http://www.baidu.com"

smalltom30 发表于 2014-09-23 15:42

http://www.jb51.net/article/44070.htm
python paramiko实现ssh远程访问的方法，讲解得比较到位

页: [1] 2

Chinaunix's Archiver

py爬虫基础问题请教