[练习] 索引序列
本帖最后由 rubyish 于 2014-03-09 23:34 编辑有一组数据:
>1
GCGTGCGTAA
AAAATAAATCACCTTTCGGGCTAACTTTGCGGTCGAGAACTCATTACCCAAATCCTACAATCACAAATTTG
ACATACTTAGAATTAAAAAACCAAACCACACACAGAAACAAGGTTAAGATAAATGAACAAAGGAGAATGATTTAGTTAGTAACCTCAACATTAGAGAGCT
TTCCTTCGTTTAATACTTTGAAGATGGCAAAACCACCTGGCGTCTCAAACAGTATTAGCATTTTTACAAGCTCCTGAAAAGAAGAAAAAACAAGATTAAG
AGAAACCCTTGAATCAACTCAAAGTCACCAAACTTGCAAGTTAGTGTTTTACGAGCTAAAGCCATTAAGCTAGTACGAAACAATATTCAAGGTAAAACTT
TTTCCCTGTTTCGAAGTTTACAAATCAGAATAATAGTTAAGGCAGATACTCTGTTCAAATTCTTGAAAATCCGACAAAGAACAGAACTATACTTTGTCAA
AGTCTTAGAATTTTGGAAGCTTATTTGCCATAACAACGAAGAAAGAGAGGGCGAGAGAGATTCAGTACCTTAGAGGGTTAACGCACAACTACAGCTTTAG
AGAGAGCTAAAATCAGGAGCCGCGATTCTGCTAGGGTTTAAGATGGTTTTTA
>2
TAAAAAAAAAATAACATCATTATATAATATATAGAGTTTAAAACATCTCAAAAACAAATTCATCATATTTTGTGATTCGAAATTTTAAGAATGAACATAT
ATTAACTAATTGGCGAAAAATGCGTGGGTTCAACGTCCCGCAACGAATAAAATATTTTGACAATGATTCATAAACATATTATAAATAAGATCAACATTAA
TAAAATAAATAATTTTTTTTTGTGGATGGATTTGGTTTGGCAGGACGTTACTTAATAACAATTGTAAACTATAAAATAATTTACAAATTTT
ATATATATTAATTTAAAAAATGAATTGTCTACGCGGTGTACCGCATGTTAAAATTTAGTTTCTATATATTTTAGAAACAACTTTGAATTTATACTTTAAT
ATTGAATAAACAACACCAAACCCCCTATTATTCATGTTATCCATTTTTTGAAATAACAGAAAAATAGAAAATAATCATA
AGAAACCAAACAAAATATACACAACAAAAAATCAAATCATAAAGCTTTAAATACATATAAGTGAAAGATCAAATCATAAAACTATAAAGACATGAAGTAC
CTAAAACATAATATATGCTAAAAAGAAATTCAAAATACAAAATCTTCTACGTATTTGAATAATTCATCCAAACCTAAAACTGTATATCTGTTCACATATT
TGAATGAAAAATCAAATAAGCAAATCAAGCAAGAAATATTAATGATGTCTATGTGATTTTTTTTTTTGT
>3
TTAATCTGCTTTTTTTTTTTTTTTTTAATTTACTCATATTAGATTTAGCTTAATTTTGAGACTGTTAGCTTTCGGTGTGAACAAAAGAAATTTGTGAAAT
TTGATATTGTTGATACATTCTCTAGAAATTTTGGAAAGATTGTGTGTTTCTTTTCAAAATTCAAATATTAATAACGCACCAAAATATCTGAATAGAAAGA
ATAAATAATGCGCCAAAATATTGATATGATGAAAGGTTCCCGTCTCAATATGTTTTTAGACCCTAGGTAAAACTAAAT
TTACATATCCTTTTCACACGATTTTTTTTTTTTTTTTTTGACTCTTTTACTTAAAGGTTTTTTTAAAAAAATTTGCCATGCACCCTGGCAATGGCTTTTG
CCCCCACCTCCCCCACATTAAGCCAATCTTGTTGCATGGCCATCTCCCCTGACGAACACCATTAAAGATTCATCTATATGTGGTA
GCCACTGAGTAGACTTAATAGAGCATTAAATAAATGAAATTCGTGGATGCAAATTGTAGAAGAACTAGTATTTAACGGAGTGTTGCTTCATCACAAATTC
1: 想从每一条中挑选出GCGT**六个碱基(两个星号代表GCGT后面两个模糊匹配)的位置
2: Non-overlapping
>1
GCGTGCGTAA1 GCGTGC 03: Overlapping
>1
GCGTGCGTAA1 GCGTGC 0, GCGTAA 44: 结果应该像这样
when Overlapping:1 GCGTGC 0, GCGTAA 4, GCGTCT 220
2 GCGTGG 121
3
when Non-overlapping1 GCGTGC 0, GCGTCT 220
2 GCGTGG 121
3
本帖最后由 rubyish 于 2014-04-24 19:04 编辑
dddddddddddd~~~ 本帖最后由 klainogn 于 2015-07-26 19:45 编辑
class String
def srch(pattern, pos=0)
if pos=self =~ pattern
s=$&+" "+pos.to_s
ext=$'.srch(pattern, pos+$&.length)
if !ext.empty?
s += ","+ext
end
return s
else
return ''
end
end
end
fd=File.open("a")
arr=fd.readlines
arr.each {|line|
rs= line.chomp.srch(/GCGT../)
p rs if !rs.empty?
}
"GCGTGC 0"
"GCGTCT 39"
页:
[1]