- 论坛徽章:
- 0
|
如题,知道百度下拉框搜索的关键词数据是这样的URL:http://suggestion.baidu.com/su?wd={关键词},但使用requests库的get时,却返回为空,也伪装了UA,请问怎么处理?
我的代码如下:- #coding=utf-8
- import requests
- def get_box(word):
- url = 'http://suggestion.baidu.com/su?wd=%s&p=3&cb=window.bdsug.sug&from=superpage' % word
- headers = {
- 'User-Agent': 'Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+GTB7.1;+.NET+CLR+2.0.50727)'
- }
- r = requests.post(url, headers = headers)
- print r.status_code
- print r.content
- get_box('途牛')
复制代码 在网上搜索时发现了PHP版,但不了解,仅供参考:- <html>
- <head>
- <meta http-equiv="content-type" content="text/html; charset=UTF-8">
- <link type="text/css" rel="stylesheet"
- href="http://zone.wooyun.org/themes/wooyun/css/style.css"/></head>
- <body>
- <?php
- /*
- another:VIP
- date:2013-2-26
- */
- $word=$_GET['word'];
- if ($word=="")
- {
- echo <<<EOF
- <form action="" method="get">
- <p>关键词: <input type="text" name="word" /></p>
- <input type="submit" value="采集" />
- </form>
- EOF;
- }
- else
- {
- $data=file_get_contents('http://suggestion.baidu.com/su?wd='.$word);
- $data=mb_convert_encoding($data, 'UTF-8', 'UTF-8,GBK,GB2312,BIG5' );
- $data_temp=strpos($data,"x");
- $data=substr_replace($data,"",$data_temp,17);
- $data = trim($data,");");
- $data = trim($data,"{");
- $data=preg_replace("/q:.+?.e,/",'', $data);
- $data = str_replace("[","",$data);
- $data = str_replace("]","",$data);
- $data = "[".$data."]";
- $data = str_replace(",","},s:",$data);
- $data = str_replace("s:","{\"s\":",$data);//复杂的处理,以符合json格式
- $dc=json_decode($data);
- for ($n=0; $n<=9; $n++)
- {
- $wd[$n]=$dc[$n]->s;
- echo "</br>".$wd[$n];
- }
- }
- ?>
- </body>
- </html>
复制代码 |
|