- 论坛徽章:
- 0
|
我有一个文件(TEXT)简化如下:
>taxon1 gene2
———————gene2ATGAA1NNNNNNNNNN
>taxon1 gene3
——————gene3ACAAATTGGCTGGA---GCTGAAGAGCTATTGAAATCTAATAA1NNNNNNNNNN
>taxon1 gene4
—————gene4AACAAATTGGCTGGA---GCTGAACTCTCCTTTGCCAAAAATTTACGCAGAGCTTCTTAGCAACTCATCAATCATAA1NNNNNNNNNN
>taxon1 gene5
—————gene5AACAAATTGGCTGGA---GCTGAACTCTCCTTTGCCAAAAATTTACGCAGAGCTTCTTAGCAATCATAA1NNNNNNNNNN
>taxon2 gene1
—gene1AGCGACCCAGGTATACTTTCCGCTATTGTACAACAAAAAAACATCACTGAGCAAATTAGTAGTCAACTGGCTACCTTTTGCCAAAAATTTACGCAGGGCTTCTTAGCAACTCATCAATCATAA2NNNNNNNNNN
>taxon2 gene3
——————gene3ACAAATTGGCTGGA---GCTGAAGAGCTATTGAAATCTAATAA2NNNNNNNNNN
>taxon2 gene4
—————gene4AACAAATTGGCTGGA---GCTGAACTCTCCTTTGCCAAAAATTTACGCAGAGCTTCTTAGCAACTCATCAATCATAA2NNNNNNNNNN
>taxon2 gene5
—————gene5AACAAATTGGCTGGA---GCTGAACTCTCCTTTGCCAAAAATTTACGCAGAGCTTCTTAGCAATCATAA2NNNNNNNNNN
>taxon3 gene1
—gene11AGCGACCCAGGTATACTTTCCGCTATTGTACAACAAAAAAACATCACTGAGCAAATTAGTAGTCAACTGGCTACCTTTTGCCAAAAATTTACGCAGGGCTTCTTAGCAACTCATCAATCATAA3NNNNNNNNNN
>taxon3 gene2
———————gene2ATGAA3NNNNNNNNNN
>taxon3 gene4
—————gene4AACAAATTGGCTGGA---GCTGAACTCTCCTTTGCCAAAAATTTACGCAGAGCTTCTTAGCAACTCATCAATCATAA3NNNNNNNNNN
>taxon3 gene5
—————gene5AACAAATTGGCTGGA---GCTGAACTCTCCTTTGCCAAAAATTTACGCAGAGCTTCTTAGCAATCATAA3NNNNNNNNNN
>taxon4 gene1
—gene1AGCGACCCAGGTATACTTTCCGCTATTGTACAACAAAAAAACATCACTGAGCAAATTAGTAGTCAACTGGCTACCTTTTGCCAAAAATTTACGCAGGGCTTCTTAGCAACTCATCAATCATAA4NNNNNNNNNN
>taxon4 gene2
———————gene2ATGAA4NNNNNNNNNN
>taxon4 gene3
——————gene3ACAAATTGGCTGGA---GCTGAAGAGCTATTGAAATCTAATAA4NNNNNNNNNN
我想输出成
>taxon1
gene1-gene2-gene3-gene4-gene5
>taxon2
gene1-gene2-gene3-gene4-gene5
>taxon3
gene1-gene2-gene3-gene4-gene5
>taxon4
gene1-gene2-gene3-gene4-gene5
如果某一个位置的序列不存在,则打印空格。如果不打印空格就很容易,可是打印空格,我死活做不到啊,恳请大神指点,下面是我写的一个初步的脚本,输出没有序列!!!!
#perl test.pl
use strict;
use warnings;
my (%Len, $taxa, $gene);
my @Genepool=("gene1", "gene2", "gene3", "gene4", "gene5");
my $Genepool_ref=\@Genepool;
open(FD, "TEST")||die("Can not open the file!$!n");
$/=">";
<FD>;
while(<FD>){
my ($head, $seq)=split(/\n/, $_,2);
($taxa, $gene)=split(/\s/, $head,2);
if(grep{$gene eq $_}@Genepool){
$Len{$gene}=length($seq);
@Genepool=grep(/[^$gene]/,@Genepool);}
}
close(FD);
my @IDpool=("taxon1", "taxon2", "taxon3", "taxon4");
my $IDpool_ref=\@IDpool;
print "My_cat_results_is \n";
for (my $i=0; $i<=$#$IDpool_ref; $i++) {
print '>', $IDpool[$i], "\n";
open IN, "<TEST" || die "cannot open TEST!";
$/=">";
<IN>;
while(<IN>){
my (%hash, $head1, $seq1, $taxa1, $gene1);
($head1, $seq1)=split(/\n/, $_,2);
($taxa1, $gene1)=split(/\s/, $head1,2);
if($IDpool[$i] eq $taxa1) {$hash{$gene1}=$seq1;}
for(my $j=0; $j<=$#$Genepool_ref; $j++) {
my $bait=$Genepool[$j];
foreach my $k (keys %hash){
if(exists $hash{$bait}) {print $hash{$bait};}
else{print '-' x $Len{$bait};} } }
}
close(IN);
print "\n";
}
|
|