- 论坛徽章:
- 1
|
本帖最后由 56836430 于 2015-10-10 21:42 编辑
有多条这样的序列,想根据它们的长度进行筛选,如小于100个氨基酸的删除
in.fasta
>lcl|Abi_c1818_g1_i1_m.11845 unnamed protein product
-------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
------------------------PAVFSKGFNPQHVADGLYGRHLFVYSWPEGSLKQTLDLGSTGLIPLEVRFLHDPAKDTGYVACALSS
TLV----------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
-------------------
>lcl|Abi_c315_g1_i1_m.67006 unnamed protein product
------------------------------------------------------------GP----------------------GY-----
-------------------------------ASPKEA---------MEGPREALIYVTAVYT------------GTGRGKPDYLATVDVDP
TSPTYSKVVHRLPVPHLGDELHHSGWNACSSCHGDASAQRRYLILPSLISGRIYAVDTAKDPRAPVLHKVVEPETILAKTGLGYPHTAHCL
ASGDILVSCLGDKDGNAEGNGFLLLDSDLNVKGR---------------------------------------------------WEKPGN
SPKFGYDFWYQPRHKTMISSSWGAPAAFTKGFNPQHVLDGLYGKHLFVYSWPDGTLKQTIDLGNEGLIPLEVRFLHEPSKDTGYVGCALSG
NMVRFFKTSDGSWDHEVVISVPRFKVQNWILPEMPGLITDLLISLDDRYLYFVNWLHGDVRQYNIEDPKKPVLVGQVWVGGLVRKGSKVIV
EKENGQQWQSDVSDVQGKYLRGGPQMIQLSLDGKRLYVTNSLFSAWDRQFY-PELIEKGSHILQIDCNTEKGGLSVNSNFFVDFETEPEGP
ALAHEMRYPGGDCTSDIWV
out.fasta
>lcl|Abi_c315_g1_i1_m.67006 unnamed protein product
------------------------------------------------------------GP----------------------GY-----
-------------------------------ASPKEA---------MEGPREALIYVTAVYT------------GTGRGKPDYLATVDVDP
TSPTYSKVVHRLPVPHLGDELHHSGWNACSSCHGDASAQRRYLILPSLISGRIYAVDTAKDPRAPVLHKVVEPETILAKTGLGYPHTAHCL
ASGDILVSCLGDKDGNAEGNGFLLLDSDLNVKGR---------------------------------------------------WEKPGN
SPKFGYDFWYQPRHKTMISSSWGAPAAFTKGFNPQHVLDGLYGKHLFVYSWPDGTLKQTIDLGNEGLIPLEVRFLHEPSKDTGYVGCALSG
NMVRFFKTSDGSWDHEVVISVPRFKVQNWILPEMPGLITDLLISLDDRYLYFVNWLHGDVRQYNIEDPKKPVLVGQVWVGGLVRKGSKVIV
EKENGQQWQSDVSDVQGKYLRGGPQMIQLSLDGKRLYVTNSLFSAWDRQFY-PELIEKGSHILQIDCNTEKGGLSVNSNFFVDFETEPEGP
ALAHEMRYPGGDCTSDIWV
从网上找到一个例子,但是这个例子中进行长度计算时,没有去除“-”,脚本如下:
#!/usr/bin/perl
use strict;
use Bio::SeqIO;
my ($infile, $outfile, $cut) = @ARGV;
my $o_seqi = Bio::SeqIO->new(-file => $infile, -format => 'fasta');
my $o_seqo = Bio::SeqIO->new(-file => ">$outfile",-format => 'fasta');
while (my $o_seq = $o_seqi->next_seq) {
next if ($o_seq->length < $cut);
$o_seqo->write_seq($o_seq);
}
请问怎么改进这个脚本,从而能实现目标?
|
|