Chinaunix
标题:
两个文件提取合并
[打印本页]
作者:
Eva326
时间:
2017-04-18 14:39
标题:
两个文件提取合并
本帖最后由 Eva326 于 2017-04-18 16:20 编辑
小白请教各位,有两个文件都是tab分隔,
文件1:
ID
1
1.1
2
3
文件2:
1 a1 11
1 a2 12
3 c1 31
3 c2 32
3 c3 33
4 d 41
文件1和2的第一列ID有部分相同,想把文件2中含文件1的行提取出来,并且以第一列作为文件名,得到1.txt和3.txt两个文件:
>cat 1.txt
a1 11
a2 12
>cat.3.txt
c1 31
c2 32
c3 33
请大神指教!多谢!
作者:
sunzhiguolu
时间:
2017-04-18 16:10
#!/usr/bin/perl
use strict;
use warnings;
sub Load{
local @ARGV = @_;
my %hData = ();
while(<>){
my @aT = split;
@aT % 2 ? $hData{$aT[0]}++ : push(@{$hData{$aT[0]}}, $aT[-1]);
}
return wantarray ? keys %hData : \%hData;
}
my $fb = Load(pop);
foreach(Load(pop)){
next if(!exists($fb->{$_}));
open(my $FHw, '>', "$_.txt");
print($FHw join("\n", @{$fb->{$_}}, ''));
close($FHw);
}
复制代码
perl abc.pl a.txt b.txt
作者:
Eva326
时间:
2017-04-19 09:10
回复
2#
sunzhiguolu
多谢大神!不过对于我这样的小白来说,看懂有点费劲呢
作者:
sunzhiguolu
时间:
2017-04-19 11:17
#!/usr/bin/perl
use strict;
use warnings;
open(my $FHr, '<', shift(@ARGV)); #Read data from a.txt
my %hKeys = ();
while(<$FHr>){
my ($k) = split;
$hKeys{$k} = 1;
}
close($FHr);
my %hData = ();
open($FHr, '<', shift(@ARGV));
while(<$FHr>){
my ($k, @aT) = split;
push(@{$hData{$k}}, "@aT");
}
close($FHr);
foreach(keys %hKeys){
next if(!exists($hData{$_}));
open(my $FHw, '>', "$_.txt");
print($FHw join("\n", @{$hData{$_}}, ''));
close($FHw);
}
复制代码
perl abc.pl a.txt b.txt
作者:
rubyish
时间:
2017-04-21 03:10
本帖最后由 rubyish 于 2017-04-20 23:12 编辑
perl abc.pl doc1 doc2
#!/usr/bin/perl -w
my $z = pop;
my %l = map { chomp; $_ => 0 } <>;
@ARGV = $z;
while (<>) {
my ( $k, $v ) = split /\t/, $_, 2;
next unless exists $l{$k};
$l{$k} ||= do { open my $F, '>', "$k.txt"; $F };
print { $l{$k} } $v;
}
close $_ for grep $_, values %l;
__DATA__
$_
复制代码
作者:
Eva326
时间:
2017-04-21 14:31
文件2:
谢谢大家!还有一个小问题,
如果文件2有重复:
1 a1 11
1 a1 11
1 a2 12
3 c1 31
3 c2 32
3 c3 33
4 d 41
最后结果如何去除呢?
作者:
jason680
时间:
2017-04-27 09:25
回复
6#
Eva326
如果文件2
有重复
: ... 最后结果如何
去除
呢?
$ cat a.txt
1
1.1
2
3
$ cat b.txt
1 a1 11
1 a1 11
1 a2 12
3 c1 31
3 c2 32
3 c3 33
4 d 41
$ perl gen_x_txt.pl a.txt b.txt
Create output file: 1.txt
Create output file: 3.txt
$ cat 1.txt
a1 11
a2 12
$ cat 3.txt
c1 31
c2 32
c3 33
$ cat gen_x_txt.pl
#! /usr/bin/perl
use strict;
use warnings;
sub message{
print <<EOF;
Usage : $0 ID_FILE DATA_FILE
Example: $0 a.txt b.txt
EOF
exit 1;
}
message() if @ARGV != 2;
my(%hIdx, %hTxt, %hCnt);
my $sCnt = 0;
my($sFidx, $sFdat) = @ARGV;
open(my $FHidx, "<", $sFidx) or die "cannot open $sFidx file\n";
open(my $FHdat, "<", $sFdat) or die "cannot open $sFdat file\n";
while(<$FHidx>){
s/^\s+|\s+$//g;
next if(m/^(#|$)/);
$hIdx{$_} = 1;
}
while(<$FHdat>){
s/^\s+|\s+$//g;
next if(m/^(#|$)/);
my($sKey, @aData) = split;
next if(! exists $hIdx{$sKey});
next if($hCnt{"$sKey @aData"}++);
openfile($sKey) if(! exists $hTxt{$sKey});
print {$hTxt{$sKey}} join(" ",@aData),"\n";
}
sub openfile{
my($sKey) = @_;
my $sFile = "$sKey.txt";
open(my $FHout, ">", $sFile) or die "cannot open $sFile file\n";
print "Create output file: $sFile\n";
$hTxt{$sKey} = $FHout;
}
欢迎光临 Chinaunix (http://bbs.chinaunix.net/)
Powered by Discuz! X3.2