- 论坛徽章:
- 0
|
这是我使用的代码:- #coding:utf-8
- import httplib2
- def fetch(url):
- http_header = {'User-Agent':'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1',
- 'Referer':'http://www.baidu.com/',
- 'Host':'baidu.com'}
- h = httplib2.Http('.cache')
-
- print ("Start downloading data....")
- response, content = h.request(url,headers = http_header)
- print ("Finish downloading data...")
- print(response['-content-encoding'])
- try:
- with open('baidu.html',"w") as dig:
- print(content,file = dig)
- except IOError as err:
- print('File error: ' + str(err))
- if __name__ == "__main__":
- fetch("http://www.baidu.com/")
复制代码 这是返回的baidu.html文件的部分内容:- url(\'http://www.baidu.com/img/shadu_7b9f89289d791938dc0eb5d94a0ad4d2.png\');background-repeat:no-repeat;padding:2px 0 2px 23px;" data-linkid="3">\xe8\xae\xa9\xe4\xb8\x8a\xe7\xbd\x91\xe6\x9b\xb4\xe5\xae\x89\xe5\x85\xa8\xef\xbc\x8c\xe7\xab\x8b\xe5\x8d\xb3\xe4\xb8\x8b\xe8\xbd\xbd\xe7\x99\xbe\xe5\xba\xa6\xe6\x9d\x80\xe6\xaf\x92</a></p></div></div><div id="ftCon"><div id="ftConw"><p ><a id="seth" onClick="h(this)" href="/" onmousedown="return ns_c({\'fm\':\'behs\',\'tab\':\'homepage\',\'pos\':0})">\xe6\x8a\x8a\xe7\x99\xbe\xe5\xba\xa6\xe8\xae\xbe\xe4\xb8\xba\xe4\xb8\xbb\xe9\xa1\xb5</a><a id="setf" href="http://www.baidu.com/cache/sethelp/index.html"
复制代码 其中中文内容全部变成了\xe8\xae\xa9\xe4\xb8\x8a\xe7\xbd\x91\xe6\x9b\xb4\xe5\xae\x89\xe5\x85\xa8 这样的东西 |
|