rubyish 发表于 2014-03-10 01:04

[练习] 索引序列

本帖最后由 rubyish 于 2014-03-09 23:34 编辑

有一组数据:
>1
GCGTGCGTAA
AAAATAAATCACCTTTCGGGCTAACTTTGCGGTCGAGAACTCATTACCCAAATCCTACAATCACAAATTTG
ACATACTTAGAATTAAAAAACCAAACCACACACAGAAACAAGGTTAAGATAAATGAACAAAGGAGAATGATTTAGTTAGTAACCTCAACATTAGAGAGCT
TTCCTTCGTTTAATACTTTGAAGATGGCAAAACCACCTGGCGTCTCAAACAGTATTAGCATTTTTACAAGCTCCTGAAAAGAAGAAAAAACAAGATTAAG
AGAAACCCTTGAATCAACTCAAAGTCACCAAACTTGCAAGTTAGTGTTTTACGAGCTAAAGCCATTAAGCTAGTACGAAACAATATTCAAGGTAAAACTT
TTTCCCTGTTTCGAAGTTTACAAATCAGAATAATAGTTAAGGCAGATACTCTGTTCAAATTCTTGAAAATCCGACAAAGAACAGAACTATACTTTGTCAA
AGTCTTAGAATTTTGGAAGCTTATTTGCCATAACAACGAAGAAAGAGAGGGCGAGAGAGATTCAGTACCTTAGAGGGTTAACGCACAACTACAGCTTTAG
AGAGAGCTAAAATCAGGAGCCGCGATTCTGCTAGGGTTTAAGATGGTTTTTA
>2
TAAAAAAAAAATAACATCATTATATAATATATAGAGTTTAAAACATCTCAAAAACAAATTCATCATATTTTGTGATTCGAAATTTTAAGAATGAACATAT
ATTAACTAATTGGCGAAAAATGCGTGGGTTCAACGTCCCGCAACGAATAAAATATTTTGACAATGATTCATAAACATATTATAAATAAGATCAACATTAA
TAAAATAAATAATTTTTTTTTGTGGATGGATTTGGTTTGGCAGGACGTTACTTAATAACAATTGTAAACTATAAAATAATTTACAAATTTT
ATATATATTAATTTAAAAAATGAATTGTCTACGCGGTGTACCGCATGTTAAAATTTAGTTTCTATATATTTTAGAAACAACTTTGAATTTATACTTTAAT
ATTGAATAAACAACACCAAACCCCCTATTATTCATGTTATCCATTTTTTGAAATAACAGAAAAATAGAAAATAATCATA
AGAAACCAAACAAAATATACACAACAAAAAATCAAATCATAAAGCTTTAAATACATATAAGTGAAAGATCAAATCATAAAACTATAAAGACATGAAGTAC
CTAAAACATAATATATGCTAAAAAGAAATTCAAAATACAAAATCTTCTACGTATTTGAATAATTCATCCAAACCTAAAACTGTATATCTGTTCACATATT
TGAATGAAAAATCAAATAAGCAAATCAAGCAAGAAATATTAATGATGTCTATGTGATTTTTTTTTTTGT
>3
TTAATCTGCTTTTTTTTTTTTTTTTTAATTTACTCATATTAGATTTAGCTTAATTTTGAGACTGTTAGCTTTCGGTGTGAACAAAAGAAATTTGTGAAAT
TTGATATTGTTGATACATTCTCTAGAAATTTTGGAAAGATTGTGTGTTTCTTTTCAAAATTCAAATATTAATAACGCACCAAAATATCTGAATAGAAAGA
ATAAATAATGCGCCAAAATATTGATATGATGAAAGGTTCCCGTCTCAATATGTTTTTAGACCCTAGGTAAAACTAAAT
TTACATATCCTTTTCACACGATTTTTTTTTTTTTTTTTTGACTCTTTTACTTAAAGGTTTTTTTAAAAAAATTTGCCATGCACCCTGGCAATGGCTTTTG
CCCCCACCTCCCCCACATTAAGCCAATCTTGTTGCATGGCCATCTCCCCTGACGAACACCATTAAAGATTCATCTATATGTGGTA
GCCACTGAGTAGACTTAATAGAGCATTAAATAAATGAAATTCGTGGATGCAAATTGTAGAAGAACTAGTATTTAACGGAGTGTTGCTTCATCACAAATTC



1: 想从每一条中挑选出GCGT**六个碱基(两个星号代表GCGT后面两个模糊匹配)的位置

2: Non-overlapping
>1
GCGTGCGTAA1       GCGTGC 03: Overlapping

>1
GCGTGCGTAA1       GCGTGC 0, GCGTAA 44: 结果应该像这样
when Overlapping:1       GCGTGC 0, GCGTAA 4, GCGTCT 220
2       GCGTGG 121
3
when Non-overlapping1       GCGTGC 0, GCGTCT 220
2       GCGTGG 121
3

rubyish 发表于 2014-03-11 00:05

本帖最后由 rubyish 于 2014-04-24 19:04 编辑

dddddddddddd~~~

klainogn 发表于 2015-07-26 19:42

本帖最后由 klainogn 于 2015-07-26 19:45 编辑

class String
   def srch(pattern, pos=0)
         if pos=self =~ pattern
            s=$&+" "+pos.to_s
            ext=$'.srch(pattern, pos+$&.length)
            if !ext.empty?
                s += ","+ext
            end            
            return s
         else
            return ''
         end
   end
end
fd=File.open("a")
arr=fd.readlines
arr.each {|line|
   rs= line.chomp.srch(/GCGT../)
   p rs if !rs.empty?
}


"GCGTGC 0"
"GCGTCT 39"
页: [1]
查看完整版本: [练习] 索引序列