Chinaunix

标题: 求助 [打印本页]

作者: zhagnqiang829    时间: 2013-03-16 09:27
标题: 求助
提取最后一对序列
如下序列
1_NC_003070I001CATTCAAGGCTTAAAAGACTTAAAGTAAACGTTCATTAGCTAAATCCAA
E012CACTATCTCCGTAACAAAATCGAAGGAAACACTAGCCGCGACGTTG
I001 63 AAAAAAAAACAAA 75 83
E012 153 AGAAAACAACAAA 165 200
13  47
I002CAAAAGAAGATAAGTATATATATATATATATATATACACCTATAT
E023TAAAAGGGTTTTGGTGTTCCTCGATGGAAGATACCCTGACAAAACCAAATCTGA
I002 151 ATGAGAGTGGTTACAAAATATGACTACTTGAATACACGAATG 192 210
E023 96 ATCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTG 137 200
42  57
2_NC_003070I001CATTCTAAAAACATAATTACCTAGAAGAAACTGGGTTAATCCCA
E012ATGCGGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAGTGTGAACGCATG
I001 57 GATAATTTAAAGAAGTTTCAAAGG 80 107
E012 6 GAAATTGCAAGGAAGTAGCAGATG 29 52
24  39
I002CAGACAAAATTAGTGAAAGAAAGGGAAAAACCCACAAAGGGA
E023GGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAGTGTGAACGCATGAT
I002 35 CAAAGGGATAAAGAAAGGT 53 92
E023 80 CAAAGGTATATAGACACGT 98 138
19  59
I003CATTTTTTGATAAAGTTAAGTTCAAGAACTAAAGTTCGTAACGATCAAGGACAG
E034CAATATTCAGCATCTGTTGTGGAAGTTGGTCTTCGCCTATCTTCTTCTAGACTGTT
I003 1 CATTTTTTGATAAAGTT 17 249
E034 11 CATCTGTTGTGGAAGTT 27 56
17  40
要求:把上各个基因名下的最后一个对匹配序列取出来(就是红色的部分)。
结果如下
1_NC_003070
I002 151 ATGAGAGTGGTTACAAAATATGACTACTTGAATACACGAATG 192 210
E023 96 ATCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTG 137 200
2_NC_003070
I003 1 CATTTTTTGATAAAGTT 17 249
E034 11 CATCTGTTGTGGAAGTT 27 56

作者: zhagnqiang829    时间: 2013-03-16 09:33
提取最后一对序列
如下序列
1_NC_003070
I001CATTCAAGGCTTAAAAGACTTAAAGTAAACGTTCATTAGCTAAATCCAA
E012CACTATCTCCGTAACAAAATCGAAGGAAACACTAGCCGCGACGTTG
I001 63 AAAAAAAAACAAA 75 83
E012 153 AGAAAACAACAAA 165 200

13  47
I002CAAAAGAAGATAAGTATATATATATATATATATATACACCTATAT
E023TAAAAGGGTTTTGGTGTTCCTCGATGGAAGATACCCTGACAAAACCAAATCTGA
I002 151 ATGAGAGTGGTTACAAAATATGACTACTTGAATACACGAATG 192 210
E023 96 ATCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTG 137 200
42  57
2_NC_003070
I001CATTCTAAAAACATAATTACCTAGAAGAAACTGGGTTAATCCCA
E012ATGCGGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAGTGTGAACGCATG
I001 57 GATAATTTAAAGAAGTTTCAAAGG 80 107
E012 6 GAAATTGCAAGGAAGTAGCAGATG 29 52
24  39
I002CAGACAAAATTAGTGAAAGAAAGGGAAAAACCCACAAAGGGA
E023GGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAGTGTGAACGCATGAT
I002 35 CAAAGGGATAAAGAAAGGT 53 92
E023 80 CAAAGGTATATAGACACGT 98 138
19  59
I003CATTTTTTGATAAAGTTAAGTTCAAGAACTAAAGTTCGTAACGATCAAGGACAG
E034CAATATTCAGCATCTGTTGTGGAAGTTGGTCTTCGCCTATCTTCTTCTAGACTGTT
I003 1 CATTTTTTGATAAAGTT 17 249
E034 11 CATCTGTTGTGGAAGTT 27 56

17  40
要求:把上各个基因名下的最后一个对匹配序列取出来(就是红色的部分)。
结果如下
1_NC_003070
I002 151 ATGAGAGTGGTTACAAAATATGACTACTTGAATACACGAATG 192 210
E023 96 ATCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTG 137 200
2_NC_003070
I003 1 CATTTTTTGATAAAGTT 17 249
E034 11 CATCTGTTGTGGAAGTT 27 56

作者: rubyish    时间: 2013-03-16 10:46
本帖最后由 rubyish 于 2013-03-16 06:48 编辑
  1. #!/usr/bin/perl
  2. use 5.016;

  3. say for map {(split /\n/)[0, -3, -2]} split /(?=\d+_.*?\n)/, do {local $/; <DATA>};

  4. __DATA__
  5. 1_NC_003070
  6. I001CATTCAAGGCTTAAAAGACTTAAAGTAAACGTTCATTAGCTAAATCCAA
  7. E012CACTATCTCCGTAACAAAATCGAAGGAAACACTAGCCGCGACGTTG
  8. I001 63 AAAAAAAAACAAA 75 83
  9. E012 153 AGAAAACAACAAA 165 200
  10. 13  47
  11. I002CAAAAGAAGATAAGTATATATATATATATATATATACACCTATAT
  12. E023TAAAAGGGTTTTGGTGTTCCTCGATGGAAGATACCCTGACAAAACCAAATCTGA
  13. I002 151 ATGAGAGTGGTTACAAAATATGACTACTTGAATACACGAATG 192 210
  14. E023 96 ATCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTG 137 200
  15. 42  57
  16. 2_NC_003070
  17. I001CATTCTAAAAACATAATTACCTAGAAGAAACTGGGTTAATCCCA
  18. E012ATGCGGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAGTGTGAACGCATG
  19. I001 57 GATAATTTAAAGAAGTTTCAAAGG 80 107
  20. E012 6 GAAATTGCAAGGAAGTAGCAGATG 29 52
  21. 24  39
  22. I002CAGACAAAATTAGTGAAAGAAAGGGAAAAACCCACAAAGGGA
  23. E023GGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAGTGTGAACGCATGAT
  24. I002 35 CAAAGGGATAAAGAAAGGT 53 92
  25. E023 80 CAAAGGTATATAGACACGT 98 138
  26. 19  59
  27. I003CATTTTTTGATAAAGTTAAGTTCAAGAACTAAAGTTCGTAACGATCAAGGACAG
  28. E034CAATATTCAGCATCTGTTGTGGAAGTTGGTCTTCGCCTATCTTCTTCTAGACTGTT
  29. I003 1 CATTTTTTGATAAAGTT 17 249
  30. E034 11 CATCTGTTGTGGAAGTT 27 56
  31. 19  59
复制代码
  1. 1_NC_003070
  2. I002 151 ATGAGAGTGGTTACAAAATATGACTACTTGAATACACGAATG 192 210
  3. E023 96 ATCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTG 137 200
  4. 2_NC_003070
  5. I003 1 CATTTTTTGATAAAGTT 17 249
  6. E034 11 CATCTGTTGTGGAAGTT 27 56
复制代码





欢迎光临 Chinaunix (http://bbs.chinaunix.net/) Powered by Discuz! X3.2