- 论坛徽章:
- 0
|
题,要提取网页内容,先从一个页面开始,获得cookie,然后访问该网页上的一些列提供的连接,
大致程序如下:
use LWP::UserAgent;
use HTTP::Cookies;
my $ua = LWP::UserAgent->new;
my $cookie_jar = HTTP::Cookies->new( );
$ua->cookie_jar($cookie_jar);
my $request = HTTP::Request->new(GET => $url1);
my $response = $ua->request($request);
$cookie_jar->extract_cookies($response);
$ua->cookie_jar($cookie_jar);
$request = HTTP::Request->new(GET => $url2);
$cookie_jar->add_cookie_header($request);
my $response = $ua->request($request);
my $content = $response->content();
print "$content"; |
应该是在访问url2已经上传了cookie,可是可能在url2页面在初始化的时候,有个javascript检测浏览器是否支持cookies,那些语言如下
var cookieEnabled = (navigator.cookieEnabled)
if (typeof navigator.cookieEnabled == "undefined" && !cookieEnabled) {
document.cookie = "testcookie"
cookieEnabled = (document.cookie.indexOf("testcookie") != -1)
}
if (!cookieEnabled) {
alert('You must enable cookie support for your browser to use this site.');
putSessionAttribute("cookieEnabled", "no");
} else {
putSessionAttribute("cookieEnabled", "yes");
} | 、
有没有方法绕过这个javascipt啊,我也试用了$ua->agent('Mozilla/5.0'); 但也不好使,帮帮忙!
谢谢了 |
|