- 论坛徽章:
- 0
|
今天群里又遇到有人提出如何获取下载地址对应的真实文件名 这个问题。秉着好学的精神,研究了下这个问题。
Qestion:获取URL http://www.epnf.cn/download/job.php?job=download&id=170&did=0 下载文件的真实名称
Answer :不知道这是啥原理,研究并解决:
1.首先,为了解决这个问题,我抓了下包,看看,为啥浏览器的下载能获取真实文件名:
GET /download/job.php?job=download&id=170&did=0 HTTP/1.1..Host: www.epnf.cn..User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4 GTB7.0..Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept-Language: zh-cn,zh;q=0.5..Accept-Encoding: gzip,deflate..Accept-Charset: GB2312,utf-8;q=0.7,*;q=0.7..Keep-Alive: 115..Connection: keep-alive....
服务器响应:
HTTP/1.1 302 Moved Temporarily..Connection: close..Date: Wed, 23 Jun 2010 03:37:57 GMT..Server: Microsoft-IIS/6.0..X-Powered-By: ASP.NET..X-Powered-By: PHP/5.2.9-2..Set-Cookie: USR=eSmNHcVu%09%091277264277%09http%3A%2F%2Fwww.epnf.cn%2Fdownload%2Fjob.php%3Fjob%3Ddownload%26id%3D170%26did%3D0..location:http://q.epnf.cn/rar/d14.rar..Content-type: text/html....
(这里省略了一些数据包...)
我们观察上一个数据包,可以看到,网站通过服务器端PHP脚本设置 Location属性,重定向到资源文件,资源文件就此暴露:
http://q.epnf.cn/rar/d14.rar
OK,基本原理已经搞清,下面就使用Python来实现发掘资源真实URL这一过程:
1 import httplib
2 conn = httplib.HTTPConnection("www.epnf.cn")
3 conn.request(("HEAD", "/download/job.php?job=download&id=170&did=0")
4 res = conn.getresponse()
5 print res.getheaders()
运行后得到结果:
[('x-powered-by', 'ASP.NET, PHP/5.2.9-2'), ('set-cookie', 'USR=7CmRO9pq%09%091277300980%09http%3A%2F%2Fwww.epnf.cn%2Fdownload%2Fjob.php%3Fjob%3Ddownload%26id%3D170%26did%3D0'), ('server', 'Microsoft-IIS/6.0'), ('connection', 'close'),('location', 'http://q.epnf.cn/rar/d14.rar'), ('date', 'Wed, 23 Jun 2010 13:49:40 GMT'),('content-type', 'text/html')]
哈,可以看到结果了,List中有Tuple,提取出来:resUrl=res.getheaders()[4][1]
小功告成!
|
|