Chinaunix
标题:
perl 重命名ID的代码
[打印本页]
作者:
一串儿葡萄皮
时间:
2015-07-30 15:45
标题:
perl 重命名ID的代码
我有n条fasta格式的序列,现在想对序列的ID名称改变为从1开始 的数字,即seq1 , seq2.....seqn,希望大神帮忙用perl写个代码
我文件格式,ID如下如下:
>AT3G56040.1|PACid:19662817
MANPQASPILHHPQNHLSLFHFRTTTSPRSFSSLHFRKPLLFLSSSSSFSSKLQQSEQQCNNHQVRHVSTVPVEYSTPTPPESDDFLSEIDRLKSLLSKLDVSKDLRRKDAVIDADSRVRRFFSENRGGLSKVFGYLGLNSNEMFLVKCVIAAGQEHALCMNYEEAFGEEEEEYTVRSSVKNALYALVEMIERFDVNSSGYKGRREMGTVLDSEEIAHFRKFLTFLEEIEQFYDCIGGIIGYQVMVLELLHQSSKRRNTNRSQLVEESLGCQYLEMHTPSVLDLTQEEDYASQAALWGIEGLPDLGEIYPLGGAADRLGLIDSETGECLPAAMLAHCGRTLLEGLIRDLQAREFLYFKLYGKQCVTPVAIMTSAAKNNHEHVSSLCERLKWFGRGQSNFRLFEQPLVPAVSAEDGQWIVSKPFVPVSKPGGHGVIWKLAYDKGVFNWFYDHGRKGATVRQVSNVVAATDVTLLALAGIGLRYNKKLGFASCKRNAGATEGINVLMEKKNFDGKWEYGISCIEYTEFDKFDISNRSPSSNGLQADFPANTNILYVDLHSAELIGSSSNAKSLPNMVLNTKKRIEYLDQYGDYHSVMGGRLECTMQNIADNFFNKFPSRCHGSLEDKLDTYIVYNERRKVTSSAKKKKPHASAALHQTPDGALLDILRNGYDLLTECDIKLPMIEANDKYVDSPPPYLILLHPALGPLWEVSRQKFKGGSISSCSELQLEIAEFSWNNVQVDGSLIVTAENAMGSTTPNDNGEPILQYGLRCGKCKLHNVNVVNRGIDWNSKSNVYWRNDVNRLETCKIILHGNAEFEASNVTIEGHHVFEVPDGHKLKITSGNAGLSINLEALKEEVMETGSWYWNYQLNGSHIHLQQVEVSQS*
>AT3G03250.1|PACid:19663528
MAATTENLPQLKSAVDGLTEMSESEKSGFISLVSRYLSGEAQHIEWSKIQTPTDEIVVPYEKMTPVSQDVAETKNLLDKLVVLKLNGGLGTTMGCTGPKSVIEVRDGLTFLDLIVIQIENLNNKYGCKVPLVLMNSFNTHDDTHKIVEKYTNSNVDIHTFNQSKYPRVVADEFVPWPSKGKTDKEGWYPPGHGDVFPALMNSGKLDTFLSQGKEYVFVANSDNLGAIVDLTILKHLIQNKNEYCMEVTPKTLADVKGGTLISYEGKVQLLEIAQVPDEHVNEFKSIEKFKIFNTNNLWVNLKAIKKLVEADALKMEIIPNPKEVDGVKVLQLETAAGAAIRFFDNAIGVNVPRSRFLPVKASSDLLLVQSDLYTLVDGFVTRNKA
作者:
一串儿葡萄皮
时间:
2015-07-30 16:24
是我说得不够清楚吗?没人回复,,,
作者:
climby
时间:
2015-07-30 16:28
回复
2#
一串儿葡萄皮
是将 含有 ">" 的行 修改成 seq1 , seq2, ... seqn 么
不懂fasta 格式啊
作者:
一串儿葡萄皮
时间:
2015-07-30 16:39
回复
3#
climby
嗯嗯,是的。> 开头的表示ID名称,下一行是序列
作者:
MMMIX
时间:
2015-07-30 16:43
回复
1#
一串儿葡萄皮
try this:
perl -nE '$n = ($.+1)/2; /^>/ ? print "$n\n" : print' data.txt
复制代码
作者:
climby
时间:
2015-07-30 16:48
#!/usr/bin/perl
use strict;
use warnings;
my $i = 1;
while (my $line =<DATA>){
if($line =~/^>/){
print ">seq$i\n";
$i++;
}
else{
print $line ;
}
}
__DATA__
>AT3G56040.1|PACid:19662817
MANPQASPILHHPQNHLSLFHFRTTTSPRSFSSLHFRKPLLFLSSSSSFSSKLQQSEQQCNNHQVRHVSTVPVEYSTPTPPESDDFLSEIDRLKSLLSKLDVSKDLRRKDAVIDADSRVRRFFSENRGGLSKVFGYLGLNSNEMFLVKCVIAAGQEHALCMNYEEAFGEEEEEYTVRSSVKNALYALVEMIERFDVNSSGYKGRREMGTVLDSEEIAHFRKFLTFLEEIEQFYDCIGGIIGYQVMVLELLHQSSKRRNTNRSQLVEESLGCQYLEMHTPSVLDLTQEEDYASQAALWGIEGLPDLGEIYPLGGAADRLGLIDSETGECLPAAMLAHCGRTLLEGLIRDLQAREFLYFKLYGKQCVTPVAIMTSAAKNNHEHVSSLCERLKWFGRGQSNFRLFEQPLVPAVSAEDGQWIVSKPFVPVSKPGGHGVIWKLAYDKGVFNWFYDHGRKGATVRQVSNVVAATDVTLLALAGIGLRYNKKLGFASCKRNAGATEGINVLMEKKNFDGKWEYGISCIEYTEFDKFDISNRSPSSNGLQADFPANTNILYVDLHSAELIGSSSNAKSLPNMVLNTKKRIEYLDQYGDYHSVMGGRLECTMQNIADNFFNKFPSRCHGSLEDKLDTYIVYNERRKVTSSAKKKKPHASAALHQTPDGALLDILRNGYDLLTECDIKLPMIEANDKYVDSPPPYLILLHPALGPLWEVSRQKFKGGSISSCSELQLEIAEFSWNNVQVDGSLIVTAENAMGSTTPNDNGEPILQYGLRCGKCKLHNVNVVNRGIDWNSKSNVYWRNDVNRLETCKIILHGNAEFEASNVTIEGHHVFEVPDGHKLKITSGNAGLSINLEALKEEVMETGSWYWNYQLNGSHIHLQQVEVSQS*
>AT3G03250.1|PACid:19663528
MAATTENLPQLKSAVDGLTEMSESEKSGFISLVSRYLSGEAQHIEWSKIQTPTDEIVVPYEKMTPVSQDVAETKNLLDKLVVLKLNGGLGTTMGCTGPKSVIEVRDGLTFLDLIVIQIENLNNKYGCKVPLVLMNSFNTHDDTHKIVEKYTNSNVDIHTFNQSKYPRVVADEFVPWPSKGKTDKEGWYPPGHGDVFPALMNSGKLDTFLSQGKEYVFVANSDNLGAIVDLTILKHLIQNKNEYCMEVTPKTLADVKGGTLISYEGKVQLLEIAQVPDEHVNEFKSIEKFKIFNTNNLWVNLKAIKKLVEADALKMEIIPNPKEVDGVKVLQLETAAGAAIRFFDNAIGVNVPRSRFLPVKASSDLLLVQSDLYTLVDGFVTRNKA
~
复制代码
作者:
一串儿葡萄皮
时间:
2015-07-30 17:03
回复
6#
climby
改好了,真的非常非常感谢你
作者:
一串儿葡萄皮
时间:
2015-07-30 17:06
回复
5#
MMMIX
thank you. should I say this in English? The problem is resuled.Thank you very much.
作者:
MMMIX
时间:
2015-07-30 17:23
回复
8#
一串儿葡萄皮
挑你方便的,能表达清楚意思就行了。
作者:
b114213903
时间:
2015-07-30 18:34
#!/usr/bin/perl
use strict;
use Bio::SeqIO;
my $fasta=shift @ARGV;
(my $Out=$fasta)=~s/(\.[^\.]+)$/_out$1/;
my $IN=Bio::SeqIO->new(-file=>"$fasta",-format=>'fasta');
my $OUT=Bio::SeqIO->new(-file=>">$Out",-format=>'fasta');
my $n=0;
while(my $Seq=$IN->next_seq){
$n++;
print "Now $n:\t",$Seq->id,"\n";
$Seq->id($n);
$OUT->write_seq($Seq);
}
$IN->close();
$OUT->close();
复制代码
作者:
MMMIX
时间:
2015-07-30 19:12
回复
10#
b114213903
我就知道有对应的模块……
作者:
b114213903
时间:
2015-07-31 13:39
回复
11#
MMMIX
BioPerl模块,功能还是很强大的!
作者:
MMMIX
时间:
2015-07-31 13:52
回复
12#
b114213903
关键是这个东西它有(专业)针对性,其他一些解决方法怎么看都像是 hackish
作者:
b114213903
时间:
2015-07-31 14:41
回复
13#
MMMIX
嗯,您说的很对!这个模块主要就是生物信息员需要用的功能,序列格式转换、网络数据库获取数据、数据处理和分析,报告出图一系列的模块包,很给力!但需要时间学习,甚至安装都有问题,呵呵
作者:
sunzhiguolu
时间:
2016-09-10 11:36
perl -nle 'print /\A>/ ? ++$n : $_' f
复制代码
作者:
华小飞_Perl
时间:
2016-09-11 19:51
本帖最后由 华小飞_Perl 于 2016-09-12 14:59 编辑
来一个~
#!/usr/bin/perl
use warnings;
use strict;
my $i;
while (<>) {
chomp;
if (/^>/m) {
$i++;
print ">seq", $i, "\n";
}
print $_, "\n";
}
复制代码
欢迎光临 Chinaunix (http://bbs.chinaunix.net/)
Powered by Discuz! X3.2