Chinaunix
标题:
LWP下载知乎答案中的视频
[打印本页]
作者:
523066680
时间:
2018-05-15 17:48
标题:
LWP下载知乎答案中的视频
本帖最后由 523066680 于 2018-10-16 15:36 编辑
首发:
https://zhuanlan.zhihu.com/p/36865994
视频示例:
https://www.zhihu.com/question/271736973/answer/389377346
其中 use Modern::Perl; 不是必需的。
=info
Author: 523066680
Date: 2018-05
=cut
use Modern::Perl;
use LWP::UserAgent;
use File::Slurp;
use JSON;
STDOUT->autoflush(1);
goto_dir("D:/temp");
our $main = "https://lens.zhihu.com/api/videos/";
our $ua = LWP::UserAgent->new( );
our $target = "https://www.zhihu.com/question/271736973/answer/389377346";
my $res = $ua->get( $target );
my $html = $res->content();
my @video = $html=~/>https:.*?video\/(\d+)</g;
my $oauth = get_oauth( $html );
for my $idx ( 0 .. $#video )
{
printf "Getting video %s - %s\n", $idx, $video[$idx];
my @vlinks = get_video_links( $oauth, $video[$idx] );
get_video( @vlinks );
}
# 获取 m3u8 列表并提取链接
sub get_video_links
{
our ($main, $ua);
my ( $oauth, $pgcode ) = @_;
my $res = $ua->get(
$main .$pgcode,
"authorization" => $oauth,
);
die unless $res->is_success();
my $data = decode_json( $res->content );
my $play_url = $data->{playlist}->{sd}->{play_url}; # m3u8 url
my $pre_url;
# 获取网址共用部分
$play_url =~/(.*?\w{32})/;
$pre_url = $1 ."/";
$res = $ua->get( $play_url );
my @vlinks = $res->content =~/\n(.*?\d+\.ts.*?)\n/g;
grep { $_ = $pre_url . $_ } @vlinks;
return $pgcode, @vlinks;
}
# 获取视频切片,合并
sub get_video
{
our $ua;
my $name = shift;
my $buff = "";
my $res;
while ( my $link = shift )
{
print $#_ + 1 ," ";
$res = $ua->get( $link );
$buff .= $res->content();
}
print "\n";
write_file( "${name}.ts", {binmode=>":raw"}, $buff );
}
sub get_oauth
{
our ( $ua );
my $html = shift;
my ($js) = $html =~/(https:[^<>]+main\.app[^<>]+js)/g;
my $res = $ua->get( $js );
# pattern: authorization:"oauth c3cef7c66a1843f8b3a9e6a1e3160e20"}
my ($oauth) = $res->content =~/authorization:"([^"]{30,})"/;
return $oauth
}
sub goto_dir
{
my $dir = shift;
mkdir $dir unless ( -e $dir );
chdir $dir;
}
__DATA__
复制代码
Perl 是不是过时我真的不关心,只用来做想做的事(工作和编程无关)。
2018-10 更新,
7楼
作者:
iamlimeng
时间:
2018-05-15 18:37
感谢分享!
Perl是个能干活的骆驼,用起来很爽。
作者:
dahe_1984
时间:
2018-06-15 19:31
牛,可以做个爬虫
作者:
dahe_1984
时间:
2018-06-15 19:31
牛,可以做个爬虫
作者:
523066680
时间:
2018-06-18 11:25
本帖最后由 523066680 于 2018-06-18 20:06 编辑
回复
3#
dahe_1984
写了tumblr的视频批量下载,不过需要账号登录,加上内容敏感,就没有分享~
作者:
hztj2005
时间:
2018-07-01 08:09
谢楼主,学习下!
作者:
523066680
时间:
2018-10-16 15:35
更新一下,现在知乎简化了,不用oauth,也不用多个ts文件拼接,直接单个MP4
代码保存为 UTF8 编码格式
=info
Author: 523066680
2018-07 知乎去掉了 oauth 授权方式
2018-10 从 ts 多文件,变更为 mp4 单文件下载
=cut
use JSON;
use Encode qw/from_to/;
use LWP::UserAgent;
use Mojo::DOM;
use File::Slurp;
STDOUT->autoflush(1);
our $wdir = "D:/temp";
our $main = "https://lens.zhihu.com/api/videos/";
our $ua = LWP::UserAgent->new();
our $target = "https://www.zhihu.com/question/271736973/answer/389377346";
#our $target = "https://www.zhihu.com/question/285103979/answer/492401516";
#our $target = "https://www.zhihu.com/question/278030511/answer/452274063";
my $res = $ua->get( $target );
my $html = $res->content();
my @video = $html=~/>https:.*?video\/(\d+)</g; # 获取视频页面链接
my $title = get_title_name( $html );
my ($answerID) = ($target=~/\/(\d+)$/);
mkdir $wdir unless -e $wdir;
chdir $wdir;
for my $idx ( 0 .. $#video )
{
printf "Getting video %s - %s\n", $idx, $video[$idx];
get_video( $video[$idx], "${title}Answer_${answerID}_${idx}.mp4" );
}
sub get_video
{
our ($main, $ua);
my ( $pgcode, $fname ) = @_;
my $res = $ua->get( $main .$pgcode );
die unless $res->is_success();
my $data = decode_json( $res->content );
my $play_url = $data->{playlist}->{sd}->{play_url};
$res = $ua->get( $play_url );
write_file( $fname, {binmode=>":raw"}, $res->content );
}
sub get_title_name
{
my $html = shift;
my $dom = Mojo::DOM->new($html);
my $title = $dom->at("title")->text;
$title =~s/ - 知乎//;
from_to( $title, "utf8", "gbk" );
return $title;
}
复制代码
作者:
shjwcs
时间:
2019-03-13 16:44
路过看看啦啦啦啦
作者:
灿烂小猪
时间:
2019-03-29 22:05
看不懂,学习中。
欢迎光临 Chinaunix (http://bbs.chinaunix.net/)
Powered by Discuz! X3.2