- 论坛徽章:
- 7
|
回复 10# 一串儿葡萄皮
我已经在二楼给你提醒过了,ID行有引号的,不是标准的FASTA格式,BioPerl模块不能识别!- >Srcc10g063050.1.1 Receptor-like kinase Try_threonine protein kinase
- MKTKHKLTPRPFTPTPSPTHIMGILLICFIFSITNSFVIAQDDAVPAKSFPVFTPEDNFL
- IDCGATSSITLPGNKAFQPDQNTAKYLSYTGKDIQACASDKINVPSTLYVNAKIFTTEAI
- YTFHASTSGLHWIRLHFFPFKYEEYDLK
- ">Solyc11g072930.1.1 LRR receptor-like serine_threonine-protein kinase, RLP"
- MASSYFLLLVLVLSVFSVSADVFVSLDCGSSEAYTDHETSIDWLGDVDYVANGESHVVPS
- NNSISHVSALEVRGLDSTMYSHVDDNSSDPPRITALYLSKFNLSGSLPDFSSMDAL
- ETIDLSNNNLDGPIPDFFGTLPNLKELNLANNKFSGPVPASLSNKNGLTLDTSGNSDLCS
- SSEESCQNNDSSSPGNDQPTTGSTNNNKKKKKKKNNLPIILGTTISAFLLLWAIVGIFAI
- LHYKNKRAATSLINPGQASGGSTPFVDRVQMSEKIEKNPEVTAHDHENSTNV
- ">Solyc12g00890.1.1 Kinesin IPR001752 Kinesin, motor region"
- ESVINGRDLLGFSLTSPDLVICTGSPDIPARNYGDSPEFLKGCSISLENGI
- KGSEEVQAATKLFTDWQGSKDDDLCAPADFELPSPPVEENSSELSVPIVSINVGSTDCIS
- SESGIQFSEDKYFCGGNVLSTDTRIEESICASVYQTARVGNFSYHFNNLSAGFYLVDLHF
- VEVVLTDGSTGDFSENSPQRNSLEVNGDIKAAGKLQLANVSREK
复制代码 这些示例数据,就用BioPerl的Bio::SeqIO模块来读只能识别第一个序列!- #!/usr/bin/perl
- use strict;
- use Bio::SeqIO;
- use Bio::Seq;
- my $fasta=shift @ARGV;
- (my $Out=$fasta)=~s/(\.[^\.]+)$/_out$1/;
- open (IN,"<$fasta") or die "Open $fasta failed!\n";
- my $Out=Bio::SeqIO->new(-file=>">$Out",-format=>'fasta');
- my ($flag,$seq,$id,$desc)=();
- while(my $line=<IN>){
- chomp($line);
- if($line=~/\>/){
- if($flag){
- print "$id\t$desc\n$seq\n";
- my $SEQ_OBJ=Bio::Seq->new(-seq=>$seq,-id=>$id,-desc=>$desc,-alphabet=>'protein');
- $Out->write_seq($SEQ_OBJ);
- }
- $line=~s/[\"\>]//g;
- ($id,$desc)=split (/\s+/,$line,2);
- $seq=undef;
- $flag=1;
- }else{
- $seq.=$line;
- }
- }
- print "$id\t$desc\n$seq\n";
- my $SEQ_OBJ=Bio::Seq->new(-seq=>$seq,-id=>$id,-desc=>$desc,-alphabet=>'protein');
- $Out->write_seq($SEQ_OBJ);
- $Out->close();
复制代码 |
|