Chinaunix
标题:
求助
[打印本页]
作者:
zhagnqiang829
时间:
2013-03-16 09:27
标题:
求助
提取最后一对序列
如下序列
1_NC_003070
I001CATTCAAGGCTTAAAAGACTTAAAGTAAACGTTCATTAGCTAAATCCAA
E012CACTATCTCCGTAACAAAATCGAAGGAAACACTAGCCGCGACGTTG
I001 63 AAAAAAAAACAAA 75 83
E012 153 AGAAAACAACAAA 165 200
13 47
I002CAAAAGAAGATAAGTATATATATATATATATATATACACCTATAT
E023TAAAAGGGTTTTGGTGTTCCTCGATGGAAGATACCCTGACAAAACCAAATCTGA
I002 151 ATGAGAGTGGTTACAAAATATGACTACTTGAATACACGAATG 192 210
E023 96 ATCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTG 137 200
42 57
2_NC_003070
I001CATTCTAAAAACATAATTACCTAGAAGAAACTGGGTTAATCCCA
E012ATGCGGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAGTGTGAACGCATG
I001 57 GATAATTTAAAGAAGTTTCAAAGG 80 107
E012 6 GAAATTGCAAGGAAGTAGCAGATG 29 52
24 39
I002CAGACAAAATTAGTGAAAGAAAGGGAAAAACCCACAAAGGGA
E023GGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAGTGTGAACGCATGAT
I002 35 CAAAGGGATAAAGAAAGGT 53 92
E023 80 CAAAGGTATATAGACACGT 98 138
19 59
I003CATTTTTTGATAAAGTTAAGTTCAAGAACTAAAGTTCGTAACGATCAAGGACAG
E034CAATATTCAGCATCTGTTGTGGAAGTTGGTCTTCGCCTATCTTCTTCTAGACTGTT
I003 1 CATTTTTTGATAAAGTT 17 249
E034 11 CATCTGTTGTGGAAGTT 27 56
17 40
要求:把上各个基因名下的最后一个对匹配序列取出来(就是红色的部分)。
结果如下
1_NC_003070
I002 151 ATGAGAGTGGTTACAAAATATGACTACTTGAATACACGAATG 192 210
E023 96 ATCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTG 137 200
2_NC_003070
I003 1 CATTTTTTGATAAAGTT 17 249
E034 11 CATCTGTTGTGGAAGTT 27 56
作者:
zhagnqiang829
时间:
2013-03-16 09:33
提取最后一对序列
如下序列
1_NC_003070
I001CATTCAAGGCTTAAAAGACTTAAAGTAAACGTTCATTAGCTAAATCCAA
E012CACTATCTCCGTAACAAAATCGAAGGAAACACTAGCCGCGACGTTG
I001 63 AAAAAAAAACAAA 75 83
E012 153 AGAAAACAACAAA 165 200
13 47
I002CAAAAGAAGATAAGTATATATATATATATATATATACACCTATAT
E023TAAAAGGGTTTTGGTGTTCCTCGATGGAAGATACCCTGACAAAACCAAATCTGA
I002 151 ATGAGAGTGGTTACAAAATATGACTACTTGAATACACGAATG 192 210
E023 96 ATCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTG 137 200
42 57
2_NC_003070
I001CATTCTAAAAACATAATTACCTAGAAGAAACTGGGTTAATCCCA
E012ATGCGGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAGTGTGAACGCATG
I001 57 GATAATTTAAAGAAGTTTCAAAGG 80 107
E012 6 GAAATTGCAAGGAAGTAGCAGATG 29 52
24 39
I002CAGACAAAATTAGTGAAAGAAAGGGAAAAACCCACAAAGGGA
E023GGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAGTGTGAACGCATGAT
I002 35 CAAAGGGATAAAGAAAGGT 53 92
E023 80 CAAAGGTATATAGACACGT 98 138
19 59
I003CATTTTTTGATAAAGTTAAGTTCAAGAACTAAAGTTCGTAACGATCAAGGACAG
E034CAATATTCAGCATCTGTTGTGGAAGTTGGTCTTCGCCTATCTTCTTCTAGACTGTT
I003 1 CATTTTTTGATAAAGTT 17 249
E034 11 CATCTGTTGTGGAAGTT 27 56
17 40
要求:把上各个基因名下的最后一个对匹配序列取出来(就是红色的部分)。
结果如下
1_NC_003070
I002 151 ATGAGAGTGGTTACAAAATATGACTACTTGAATACACGAATG 192 210
E023 96 ATCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTG 137 200
2_NC_003070
I003 1 CATTTTTTGATAAAGTT 17 249
E034 11 CATCTGTTGTGGAAGTT 27 56
作者:
rubyish
时间:
2013-03-16 10:46
本帖最后由 rubyish 于 2013-03-16 06:48 编辑
#!/usr/bin/perl
use 5.016;
say for map {(split /\n/)[0, -3, -2]} split /(?=\d+_.*?\n)/, do {local $/; <DATA>};
__DATA__
1_NC_003070
I001CATTCAAGGCTTAAAAGACTTAAAGTAAACGTTCATTAGCTAAATCCAA
E012CACTATCTCCGTAACAAAATCGAAGGAAACACTAGCCGCGACGTTG
I001 63 AAAAAAAAACAAA 75 83
E012 153 AGAAAACAACAAA 165 200
13 47
I002CAAAAGAAGATAAGTATATATATATATATATATATACACCTATAT
E023TAAAAGGGTTTTGGTGTTCCTCGATGGAAGATACCCTGACAAAACCAAATCTGA
I002 151 ATGAGAGTGGTTACAAAATATGACTACTTGAATACACGAATG 192 210
E023 96 ATCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTG 137 200
42 57
2_NC_003070
I001CATTCTAAAAACATAATTACCTAGAAGAAACTGGGTTAATCCCA
E012ATGCGGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAGTGTGAACGCATG
I001 57 GATAATTTAAAGAAGTTTCAAAGG 80 107
E012 6 GAAATTGCAAGGAAGTAGCAGATG 29 52
24 39
I002CAGACAAAATTAGTGAAAGAAAGGGAAAAACCCACAAAGGGA
E023GGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAGTGTGAACGCATGAT
I002 35 CAAAGGGATAAAGAAAGGT 53 92
E023 80 CAAAGGTATATAGACACGT 98 138
19 59
I003CATTTTTTGATAAAGTTAAGTTCAAGAACTAAAGTTCGTAACGATCAAGGACAG
E034CAATATTCAGCATCTGTTGTGGAAGTTGGTCTTCGCCTATCTTCTTCTAGACTGTT
I003 1 CATTTTTTGATAAAGTT 17 249
E034 11 CATCTGTTGTGGAAGTT 27 56
19 59
复制代码
1_NC_003070
I002 151 ATGAGAGTGGTTACAAAATATGACTACTTGAATACACGAATG 192 210
E023 96 ATCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTG 137 200
2_NC_003070
I003 1 CATTTTTTGATAAAGTT 17 249
E034 11 CATCTGTTGTGGAAGTT 27 56
复制代码
欢迎光临 Chinaunix (http://bbs.chinaunix.net/)
Powered by Discuz! X3.2