- 论坛徽章:
- 3
|
之前发过一个抓取页面的代码.可是我发现能抓取页面不能抓取json数据,
现在还在研究当中...发出原始的过程.找大伙一块研究研究...之前没玩过这东西..
我还在搜论坛,看有没有前人做过或问过的.在这里mark一下..
请问:如何抓取返回的json数据???。。。。
数据包:- POST /tools/web-sites-on-web-server/php/get-web-sites-on-web-server-json-data.php HTTP/1.1
- Host: www.yougetsignal.com
- User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101 Firefox/19.0 Iceweasel/19.0.2
- Accept: text/javascript, text/html, application/xml, text/xml, */*
- Accept-Language: en-US,en;q=0.5
- Accept-Encoding: gzip, deflate
- X-Requested-With: XMLHttpRequest
- X-Prototype-Version: 1.6.0
- Content-Type: application/x-www-form-urlencoded; charset=UTF-8
- Referer: http://www.yougetsignal.com/tools/web-sites-on-web-server/
- Content-Length: 22
- Cookie:
- Connection: keep-alive
- Pragma: no-cache
- Cache-Control: no-cache
- remoteAddress=sohu.com
复制代码 返回的数据:- HTTP/1.1 200 OK
- Date: Wed, 10 Apr 2013 13:48:43 GMT
- Server: Apache
- Vary: Accept-Encoding
- Content-Length: 16773
- Keep-Alive: timeout=2, max=100
- Connection: Keep-Alive
- Content-Type: text/html
- {"status":"Success", "resultsMethod":"scrape", "lastScrape":"2013-04-10 06:48:47", "domainCount":"516", "remoteAddress":"sohu.com", "remoteIpAddress":"61.135.181.175", "domainArray":[["029love.sohu.com.cn", ""], ["0755qq.com", ""], ["17173.com", ""], ["17173.net", ""], ["17173.net.cn", ""], ["2004.sohu.com", ""], ["2006.sports.sohu.com", ""], ["2008.sohu.com.cn", ""], ["2008.sohu.comwww.sohu.com.cn", ""], ["2010.s.sohu.com", ""], ["286818458.blog.sohu.com.cn", ""], ["394003276.sohu.com.cn", ""], ["512.sohu.com", ""], ["51center.com", ""], ["55039106.blog.sohu.com.cn", ""], ["60.sohu.com", ""], ["60269.show.sohu.com.cn", ""], ["656979.com", ""], ["8888love-me.blog.sohu.com.cn", ""], ["91hdw.com", ""], ["abcddfe.blog.sohu.com.cn", ""], ["ablum.chinaren.com.cn", ""], ["add.sohu.com", ""], ["ai-weiyang.blog.sohu.com.cn", ""], ["air.sohu.com", ""], ["aiyits.sohu.com.cn", ""], ["akitsuki.blog.sohu.com.cn", ""], ["alooflau.blog.sohu.com.cn", ""], ["alumni.chinaren.com.cn", ""], ["angel.2010.sohu.com", ""], ["apple.sohu.com.cn", ""], ["apple86.blog.sohu.com.cn", ""], ["arcticsnowfanily.blog.sohu.com.cn", ""], ["art.2008.sohu.com", ""], ["art.sohu.com", ""]
复制代码 |
|