- 论坛徽章:
- 0
|
刚开始学习python, 写了个抓取论坛附件的小工具, 遍历每个页面解析附件信息。
但是随机在请求某个页面(共需要请求几万个页面)的时候抛出BadStatusLine异常
程序结构大致如此:
conn = httplib.HTTPConnection("www.xxx.com", timeout=30)
try:
conn.request("GET", "xxx.html", "", header)
r1 = conn.getresponse()
....
except httplib.HTTPException as ex:
print r1.status, r1.reason, ex
执行时经常抛出 BadStatusLine("''",), 通过抓包分析, 发现此时服务器在收到程序的GET请求后只回应了ACK,几秒钟没有发数据
Traceback (most recent call last):
File "F:\Python27\get_file.py", line 27, in <module>
r1 = conn.getresponse()
File "F:\Python27\lib\httplib.py", line 1027, in getresponse
response.begin()
File "F:\Python27\lib\httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "F:\Python27\lib\httplib.py", line 371, in _read_status
raise BadStatusLine(line)
我想知道在无法改变服务器现状的情况下, 如何避免我的程序自动退出? 因为即使我不加异常处理时也会有问题
|
|