原帖由 "xhl" 发表:
超过 4 亿个整数按照二进制方式存放到到一个文件中,对所有整数按照大小排序并保存到新文件中
原帖由 "THEBEST" 发表:
to aero:
这个程序似乎不符合要求,楼主是要随机产生一个10000长度的字符串,然后自动查找所有同一样的字符串的开始位置.我想了一下,觉得处理起来不难(只是一些细节容易出错),但效率嘛....
原帖由 "THEBEST" 发表:
to aero:
这个程序似乎不符合要求,楼主是要随机产生一个10000长度的字符串,然后自动查找所有同一样的字符串的开始位置.我想了一下,觉得处理起来不难(只是一些细节容易出错),但效率嘛....
原帖由 "THEBEST" 发表:
我说了我的程序是有错的更谈不上完美.
明天再改改.
Source String:
AAAAAAAAAA
The Same String in the Source String:
AAAA Length: 4
The Start Pos: 0 4
AAAAA Length: 5
The Start Pos: 0 5
原帖由 "bioinfor" 发表:
我个人认为已经很不错了,只是这两个题主要是考算法而不是语法,所以我个人认为如果不搞算法的人能做到这样已经很不容易了,至少我根本达不到这种高度。
Source String:
TAGCAGCAGGTTTACGGACACCCTCCTCTCGGCTACTGCTAGTGACGGGGCGCTGCGAGGCGCAGACGGCAAGGTTCTAGCCAAACGTATGCATCGATATTTTGAGAGAGAACGGGGCCTCCTTCCTACAGAAACGTGGATGGGACGATAAAGAAGGAGATTGGATTGCCAAAGCAACATTGTGAGCAAATAAACAGGCATATGGTCATACGGTCGCATCTCCATGGGATGAGTGGCGTAAACCCGTTTGGATGCGAGGCCTGAAGGCCCTTTGTCGCACAACCCCTAGACACGTGTCTGCACGAGACGGCCGGAATAGCTAGTATGGGAAGTCTAGAAGGGGGTTCGTAGGGGGGGAGGAGCATGACTTGGCCGACAGGTCCGTCAGTTAATATATAGCTAGTGGGATCTACGACGGCGTCTTGGGGATCCAACATTCACGCAACGTTCATGGCACACTGTTATCTGTGATTTATAACCTGGTCCGGACCGCACTATTACAGGCTTCAACGACTAGAACTTCAGCGATAGGTCATTAGAGCTGTCGGGCACGAGTTTTCCGTCACCCCAGGCGTGCATTGCATCCACGGTGAGTTATGATATGCCACAATCAAAGAGGGCGACGACTTATCGTCCGATTGGGAGGATAGAAGTGACAGGCCTTATTCAATCCCTGTGTATTACTAGTTGGGTATTGGCAACCGTTTCGGCTGTTAGTAATCAGTCGTCCAGATAGTGGCCATAGTAGGGTCATCGGGCCTAAGCCCGATTATCCCCTCGTGACAGATCCTCGTAAAAACAGGAAAGACTGACGGCGTGAGTCCAGAATCATGTAAATTCCGACAAGAGCGCGATTTAGTACGCGGCCTATGACTTAGTGAGCTGAAGGGAACTCCAGACCCTCAAGACCGAAATCCTCTCAGTAGGCACGTGAGCATCATCTAGATTGACCAGCCACACGCGAGCCTAGGGGGCAGGCGTACATAAAACAATGCGGGAGACGGTACAAGTCTAACGCCTGGCGCGTAAGGGATAAAGGTCAGAGCTGCTCTCAATCATGCAATACGCGCCCTTCTAGCCAAGGCTCAATCCGCCGATAAGCCGAAGTCGTGTCATGGTAAATACCGGCCTCGGTTGTCTCACAGCTTATTTTTGTAAGTCTTCAATTTATAACGTGGCATAAGTACTGCGGCATTTGAGCTACGCTTCGCCTGTAGGCTGCCGGCATGGGAGTTGTATCAGCTCATCTTCTGCAAGTCGTGTGCCCTACTAAGGATACGCCTTCTTGTTTGCTAACGCCTGGATGTTGACGGCTATTCGTAGGACAAATATGCCGTGAATCGCCTTCATTTACCTATCACTCCTTACTCTAATCTGGATACAGCCATCGTTACCAAAGCTAGGTGCCCAACTTCGCCACTTTGCATTTTTGGTCGTCTGGTTTCTGTGCCCGACTACCGGGCCGGCCTGGTCACCGCGGTCGATCACGAAGATCCGCCCCTTGTTAATTCGAAACAAACCCGGATGTGTTCACCCGTTTGTATCAGCGTTCGTTGAGGGGGAGAACATGAGATTCGTCGGATAGATGTTCGGAGGCTTGTGAGGCCCAGCGGCAGTGATCAATTGCATAGTAAGGGGTGCCCCAGTAGGCAGTTGTTCCGCTTAATGGCAAGTCCGCTCAAAGGCCGTCAAGTCTCGGTCACATGTAGCCCATCTGGTTAGCCTGCCCCAAGCGCCAAGTCCCCCGGAGAGAAGAGGGAGTACTCGAGGTCCGGACAAACCCGCAAGCGCCGGTTTACTAAGTTACGTCGTGGCTCCGGCAGGATCCGGCACGCTTTCTTACGCGCGGGAATGAGAGCTATTGTTCGAGTACCACACCGGGACCCAGGGTCCGGGCAGGGTTACTCCCTATTGAACCCCGCCACATTTGTGCAAATAACGAGGATACCCATACATACCCAGATGGGTTGGCCTAGCGCCTGAAATTTGCGCGTTTTAGCCGTCACGTGGGTGCGCGTAGAAGCAACCCAAGCAGGGGACTCCTGAGCTCTGACTAGACCCATTATAATGACACTCTTTCCACTTGGTCACACGATACACTCCGCAGGCACTCCGTGCCATACCGATGGATTGGCCGTGCAGCCACCATTAGCAGACTGCTCCCATCCACTTAATTACTGATTCTAAGCGATCACTCGAGCATTCTGTTATTATCGGTGGCTTAGATGCAGAGGCCTTTTACGCGCGAAGCATAACCTACAACAACTCGTGGGTTAAGTTGGACGGGCGTACTAGCCGGATCGCTTTGCGAACAAAGCGAACGAAGATTACCATTTGCGCCCAGATACTCTTCCTACACTAGAAGGGATCTAATACGGCACCGAATGCAACTGTCGGGTCGTTAGTTGCATGATGGCGACTGCGAAGGTTCATGCTTGGAAGCGACAGATCGAACCGGTAACCAGTGTTAATTAACCAAGCTAACAAGAGCCTTGGTGCAGCGAGGTCTACCAAGCCATAGTTATGCAGTTCCCACCTTAGAATGGGTAGCAATTCAGGCAGTGGTCCACTGTGACCTCCCTTCGAGACTACTGGTTCGCTGAGAAGGGCGAGGCTAAGGACTCACACGCGTGTGATTCTGCGTGCGGGATGATAAACATCAGGCAAAGTGGTAAGCGGTATCCCGTGCTTCGCCGAAGGTTCCTTCCAGTCTTGGATTACGAATGAAACGTCCCTACCCCCGACGCCTTCCTAATGCGTTCCTTTGCAATCCTGCTGAGTCCCGTTGCTATTCGATGCGGGAGATATGATGAAGCACACTAGCGCTAAGGGGCTGTTTGATAATTAAATTCAATAGCAGGTCGAGCTACCTTGAAGTAGGACCATAACAGTCAACAACCCGGCGAGTTAGTGTGGTAAGCTAGCTGTGCAGAGTACGTAAGCAATCTCAACCTATGGCAGTGATAACTCGTTAGAAGTAGTACCCTGAGCCATGGCGATGGCCTAAAGGTCTCTTCAAATTACACACCGAGCAACTAAGGTGTGGCCCTAAAATCGCGACCTGACATCCGCCTACACTGCCGGCTCTGTCTATGATTCTATCGCCATGTCTCTATAGTCCGATGACTGAGGACCTTAGGGAGGCCGGTGATCTCTTTTAGCATCCATACTCGATGCTCTATGTAGATTCACCCGTTCTGGATAATTTGCCTCCCTATGTGTCAAGCCGGCCTCGAGCTGGGCCATGTTGTAAGGCAGCCTATTCTGATCTGGTGCAGCCAGGCGCGTTTCTATAAGAGAGAGTTCTAAGCTGATGGTGGTGCAGGGACGACCGAGTCCCGTTATCAACTGGGGTTGTCAGGAGTTTGAGTACCGCGGTGAGCTGGAAGAGTCATTAGATCTGCCATCTGAACATGCCTAAGCGGTCAACGGCTGCGTCAGAATGCCTCACGGACACCCGTGTGGTGTATCGTTGGCCTACACGGATCGAAACGTTTTTAAATAGATCATATCGCCTAGCCCCTTCTCATATAGGCTTTCGACTCCCGAGTCGACCTTTCCGGACATGTACCTACCCAGCTAGACAAAGTGGGGATTACGAGGCGTCACGATTGGTTTCACGACTCTCTACCTTCGCCAACAGGAGTAGTCTGATCAATCTCGTCCTGTCTCCGCCGGTCACTTTCAGCTTTCTCCACATCCGAGAACCATGTCTAGCGATCTAAGGGGTTCCTAAGCAACGGCCTTACATAAACTTCAGCGATAAGCGGCCGCCAAGCCTCTCCGGAACTCTAACGATATAGACATTGACGTCTTTACTGTCATTTTTGAATCTGACGAGTAATATTAGTCCATTCAAGATACACGGAGGCAAGGGGGAGATCATAAATACTAAAAGAAGACCATGAAGCGACTACTGCGATAGTAACGACATACGTATCGTGCGTCATCCGGAATATCGTTAATCAGGGCCACTTACATAGCATTAACGATTACCAAGAGCAACGCCAGCTGCTCCCAACATAGCCGCCTAAAATCTATCCCACCGTCGCGTGCCGGTCTGAGACAATACCGTTGCGTTTCGAATTGGATCGGAGGAAACATTGTAGCGACGTTCAATTCTGGGTTCCGAACATCGTGGGTAGATACGAAAAGGATGGCGTCGATATGTTAACTATGGAAATCTGGTAGAAGGGAGGGGGATGTCGCATAGAAGGGGTTGTCAGATATACAGGAGTGATTTTTTTTAATTACTGTCACAGGGGCAAGTCCATGGTCGGCGCCGCAGTGTTTCTACATGACGGGGTCTGACGCTCCGGCGAACAGCTTAGTTTAGTGTACGGGATCAGAGATAATCGCAGGGGTGACGACAGATCCACCTGAGGGTGCCCGACGTACCTTAGATCAATGAAGCTTTGACAGCCTATTGGACGGCAGCCCCTCGCCTTGAACGTAGGGCACTCGTCTCCACATCCGGGCTTTGCTGTAAATACCTTGGAGGCCTAGTATTCGGATCTAGTTGGTGAGTTGTTTGAAGGCCGGTCTTGCTCATGACAAATGGTCCTCGGATTAGCGTGAAGCACCCCCTACGATCCGGTCGGAGCTCGATTATGTGAATCAAGGGTGACAATGAAGCCGAATTTATATCTAGACAATAACTCAACGAGTATAAAGCGGATTGCAGATATTCCGCGCACATTAGCTTGCACTTGGGGGTTATCTTCAAGCTAACCCACCAAGGCGGCGACAATGCGACGAGTCTGGTCATCTCCTCCAATTGGCTAGAAATTGGAGCCGGGAGCCATCATATTCAACGGTGATACTGGAGGACAGATTCTGTTTATATATACTCGCACCCTGAGGTAGATTATCTTAGCTCTTAGGCAGATTAAGCCGACATATCAGTGCTTTTCCATGGACGGACCGCCCCGTAGCAGGACGCTCCTAATTGTAAGTGTGGCATTTTGGGCGAGTAATATGGTTGTTTAAAGTTAGACAGCGCGCTGTCTGCACGTTGCGTGTGTATACGTCCCTCACGAGCGCGTTAACGCCGGCGGTATTCAGCGCGCTGGGATATATAAAACTGCCGCTAGCTCCCGGCCGCAACTTTTATGATATGTAACGACCCCTACGTTAAGAGGAGGGTCATCCCCGCCGTGTCCGCCTGGGCTGACACCAGTTAGCTCTTGAAGGTAAATGATGTGCCGCACAAATACTGAGGAAGGCTGCTAACCTGCTATAGGTGAACGCAGATCTGTTCGCAAGGCGAACGTGCCGCCTTGGCGACGTATTCACTCCGTGACTGGCCGTACCATCTAAGTGAACTTGCCCAATGAGTCATCATGCTCGATTTTCGGTTAATGGAAGTCTCACCGTCTCGCGGGATTTTACCCATTTCCACGTTCCGTCTCTTGAGCACGGGCGACCCGAACAGACCGTGACAAAGGTCCAAGACGCCTATGGAACTATACCCAGGAATGGCATTTACATTGATCAACCGCTGTAGGTTGAGTCAGGAATCGCCAGGGAACTCTTACCCCCGATACCGCCCCATGCGCTCAGCCGTTACGGATAACCCGCCGCGCGGAAGACACCGTAAAGCGTTGACGAAGGCCTATAACGCTAAGCGGGTCCCGGTTGTAGCCAAGGCCACCACAGATGCGCGAGCCGGCATCTACTCCGACATACAAGTTCATGACCGAAACGCTAGCAGAGGGTTTAGCCCCTATGACTCGGGCCCGAACGATTTGCGAAACAGAATGACGCTATTCGCGAGTGGATAAGCCAAAAGAAAAGCTAACGGCAGGTTGTGGCCTCGCCGTGTAAGTTACCGAAGCGGGCTTAACTGCAGCATTCGCGAACCTAGGGTTTCAGGGGATGGCTGTTGTGACCTTATCACAGCGTGTTCGCATGCGGACTATCGTGCTATGCAGAGCGGTACCAGTGACCTCATATCAGGGATCGTTAGATGAAATGAGACAGCTCACACCAGGACTCATGACTTCGGTCTTATGGGCTGTCTGGAGGGACACGGGTGTCCCTAGAGCATTGGCAACGTAAATTAGAATGGTCGGCCCACGACCCTACAATGTCGTGAACACACGTGGTAGTTGCGCATCCGACCACCACGACGCGCAGATACGTGTCCAACTAACGTGACTCGGTCTCCCTGCCTGAGTCACCTGACCTTCCTTGATAAGGAGTTCGCCTAGGAATAACGCCTAGGCCGCAAATGGGTGTTTTTAGTGGAGCTATTGTGAAGTACGGTGCAAAACAAGCCCAAGTTCCCGGGCCGCGATTTATCGGCGAGTGTTGGAAAGCGCCGTCTAGCCCAACTTAATACACTCCCCAGACGACTTTCCTGCCGAGCCGCCGTAGACATTGCCGTTTCATTCACGGTGCCCCCTAGAAAAGGGATATACTGAAACGAAACTATCATGCTACCGAACTCTCTAGGGACCATGGTCTAGGCAGCCTGTGTCATATATTTAGATAGCCAGGGGGGATTGATTAAGATCACCCCGCAGGGATGTCCTCGCTGGCATGTCCTACTGTCTGGCCCTGATTTAAACTCCTCCGGCTTGCGCCCATAGATGTACTTGTGGCGTAGTACTGATTTACTGCGCCTCATCGTTCCCACCAAGACATCAGCGTGGTACAGGAACCTGCTCTACGAGGGATAGCAAATGAAGTAACAGAGTCTAGTAGTCAGCAGTTGATATGTGACGTTGACAGACAACTACGTATTTACGTTGCCTACGAAACATATGGAGCCTGAACAAAACCCAGAAGGAGTCACTGCTAACCCATGGGAACAAGGTCTACCCATGCCATTACACAGGCCCAGGGGGCTACTACCTGATTTGCACGACCGATCGGTGCAAGGTTACCCCTGACAGACGTTTCTGTCGGTTCATTCCTCGCGTTACTCCTGGGGTTACACCTGCAATGATCATAACTTATGAAATAGACCACTCGTAAGTATAGTTGCGTGTTCCCATACACCTCCCAATTTCGTAGTATACTGCCCGAGTTATTGGTCTCCTTAGTTTATGGCCTTTTTACCGACCTGTACGTGGCGGTTTTGTTCACCCTGAGCCCCCCCACATACGGATCGAATTTGAGCTCTAACGCCAGGAGGCACGTACGCAGCTTTCAGGGGATAGTGACCAAAGCGCGTTACCGCTGAGCACTCATAGAATGGGGATGTTAGTATAACCGTTTAAGTGGGATGACGACCGCGTCTCACCGCTGCTTCAAAGAATTCAGTGCTTCAGAGCTAGCTATCGGGAGGAAGGCTAGGTTCCAAGGCACAGGGAAGGCCCTCGGTTTGAGGTGCGGATAGAAGCGCACGCCCCAACCAGTAGGAAGGGTGTTAAAGACGATCAGACCGGATTTCTACTATTGCGTCCGGCCATCCCTTGAAGTCCTGCCCCCAAGGTGTGGACTGGAAAGGCAGATGCGCGTAAGTTCAACACTTTGACACTCAGCCACTTGTGGACAGGAGTCTGGTCGCGTGACCTTAAACTTGGCAGGCGGGGAAGTCCTACGCATTCTCCCTTGATAGAGCACGAGAGACTACATCGGTGCGTATTAGCAGCGCAAGTTGGCCCCTTATTCTAGCCTATGTTTTCCTGTATGCCGTTATCGTATGCCGGGATGATGGTTTTAAGAGGCCTGCGGTGGAGTGAAGCGTTAAATGATGGATCTTAGCTGCCTAACTCCCGGTCTAGATTGAGTGTAGCCGGGTCACAGCGGTAATCCACGCTGCCAATTTTCGTATCTTTATAGGTTGGCACTTAAGTCATGTCGGACAACTAGTTTCCCACGTTTCAAATGTACCTCTCTCAATCGCTCCGCATCCAGCCCGGGACGATAGCTGGAGGATGGGTGTGAAAGCTCAACGTCAAGTAAAAAACGGCCGACTACCTTGGTGCCGTATGTGGTGTGAAAGGAATTCCCCTTTTTTGTAGCCTTATGTACGACGTATTTGGACACCTTCTTACAGCCTCAAGTGGATGGTTGGTGTACGCCGCCCGCTGTCGAGTGACGAAGCTTTCGAGCCATAGTAGCAAACCTGACCGAAATATAGTCCTTATTCACAGCGGCTCCATTTAATTCGCGCGCCTGGGTAGAACAGGGGTGTATTGAAGGGTTATCCGGGAGTCGGTACGTCGCTAATGTTGAATTTGGAGGCATTAGTAAGTCCACCCTTTACTGATACATATAAGGGGGTATTCCGCTCCTACAGTGAGAACCTGTGTCGTAGCGCTCACATTGGTGGCCTGTAAAACCCTGATAAGTAGCTGTTGAGGACTATTCCGCGTCCGGCAATCGCCCTGGTCATTGGAAGTGTACCCACACCAGTTCAAAACCGGCGCGAATACCTAGTGCTTTGTTGACTTCTCACTGATTTCGGTCCTTAAGACACTGACTCCCGCTCCACTCGGGGGCATTGGGCTCGCGTGTTGATAAGGTATCACCCAACGCGAGGGCGGAGTATAAGACAGTAGAGAACACAATTATCTCATTTAACGTATTGACCGCTGGTCTGCCTACAGTCTCTATACCTATGCGCATACGTGATCTGAACCGATCTTGGTCGAGACGATATAGCGGTACTAGACGTCTAAGCGATTGGCAATAGTAACATCATCCGTTACGCTTAAGGACGGCCTACGCCTGGTGTTTCGGACAAGGCGTCTCGGTGCAGTCCGTTTGACTATGGGAGCTTCGGCCTTTGACAGAACCCTGTGCTTAAAGTGAACTATCGTGGACTGGAACTTTCCCAAGATTGTGATATTCGGCTGCTGACCACCAATCAAAAGTGATAGGCTACGGGACGCAAAAGTGTCGGTGTCGCAAGTATAATGTTGAGAGCGGTTGAGCGCGATGCGTTGTATGCTTGTCGTCGATTTAGTGATCCGCCGCGGGCTCTTTTACATTATTATAGCTTGTTCTAGCATGAATATTACCTGAACTCAATTCATCATTGCATTTATGCCCAACTGCTCTATGACATGGCTACACAATGAGAGATCCGGTGAGGACAAATACGCGATCCTCGAAACCGGCATGGGCTTCGCGATGATGTAATCCGGAATTAGGCCGGTCGAAATCTCGATGAATACCAACTCAGATGGAGGCGATGACCGCTATGTGCTTATACGCTATGTCAAGCACTCTCTGACCTCGTTGTTTGGGAGCAAGAATCTCTGGCCATCTTTCCAAGTAGCGAACTTCAGGGGAAGTGGCGTCCGTATAGTACAATCGAGTGCTTGTCGGTGCTTACTATCGACACACCGCAATACTTAGCGTTCTCTGTACGCTCCCGGTCGGACCAGCTGACGATTTTGCGAGCATCCTAGAGCGAGGCGAAAGTACAATATCCCATTGTTAGAGCGAACTAGCATATCAGAATAAACTGTAGCATTCACGCACTCATTGTCTCTAAAGTGAAATATTTCCAGTCACGCGCCCCATAGCGTAGAGAAATGGGATGTCCCTCACTCGACTCTCTTATAGTTCGGACATAAATGTCCCTTCAACCGTATCGACCACCGCCCGGCGAACTGCTCAAGCCGTCCCAGTACCCTAATAAAAGACTCACAGACCCAATAACCGCACTTAACCTTAGCTGCCCTTGGTTCATTAGACGCGAAAGAAAGTTTCGCGCTTCAAAATACTCCCCTGCCGCTACTCTCTAAGCGGAACGTCCTGCAGGCTTCATTATGGGAAGGTCAACATTCATCCGATAGTTGGAACCCACTCAGGATAAGTAGTCCAGTCGCCCGTTTTTAGGAATTGCGGAAGGGGCGCCTGAGTACTTACCTCAGACCTCGACTCACGAAGTTCGGAGATAGACTCGTTCATATTTTGCGCACTCTGAGCACGTTGGGCTTTAGACCGAATGGCGAATCTGTCTTTATAATTTTAACCTCACCTAGCGAATCTAGAATTCGGGCAATAAACGTGCTCTTCATACGTAATTGGACAAGTC
The Same String in the Source String:
"AACTTCAGCG"
Length: 10
The Start Pos: 518 3753
"ACCCTGAGCCC"
Length: 11
The Start Pos: 7091
"ACGGACACCC"
Length: 10
The Start Pos: 14 3450
"ACGGATCGAA"
Length: 10
The Start Pos: 3482 7110
"ACTTCAGCGA"
Length: 10
The Start Pos: 519 3754
"AGGACAAATA"
Length: 10
The Start Pos: 1322 9047
"ATGCGGGAGA"
Length: 10
The Start Pos: 992 2796
"CAGATGCGCG"
Length: 10
The Start Pos: 5656 7457
"CAGCGCGCTGG"
Length: 11
The Start Pos: 5062
"CATTCACGCA"
Length: 10
The Start Pos: 435 9431
"CCGGCATGGG"
Length: 10
The Start Pos: 1222 9071
"CGAGTAATAT"
Length: 10
The Start Pos: 3838 4970
"CTAACGCCTGG"
Length: 11
The Start Pos: 1012 1293
"CTTAGCTGCCCT"
Length: 12
The Start Pos: 9633
"CTTCAGCGAT"
Length: 10
The Start Pos: 520 3755
"GAGCTATTGT"
Length: 10
The Start Pos: 1856 6249
"GAGGAAGGCT"
Length: 10
The Start Pos: 5218 7291
"GAGTCCCGTT"
Length: 10
The Start Pos: 2778 3334
"GAGTCTGGTC"
Length: 10
The Start Pos: 4757 7505
"GCCATAGTAG"
Length: 10
The Start Pos: 739 8063
"GGGGTTGTCA"
Length: 10
The Start Pos: 3351 4216
"GGGTTGTCAG"
Length: 10
The Start Pos: 3352 4217
"GGTGTGAAAGG"
Length: 11
The Start Pos: 7945
"GTGGACTGGAA"
Length: 11
The Start Pos: 7443 8773
"TAGCCAAGGCC"
Length: 11
The Start Pos: 5641
"TCCACATCCGG"
Length: 11
The Start Pos: 4467
"TCTAAGCGATT"
Length: 11
The Start Pos: 8629
"TCTAGATTGA"
Length: 10
The Start Pos: 941 7729
"TCTCCACATCC"
Length: 11
The Start Pos: 3692 4465
"TGTCTGCACG"
Length: 10
The Start Pos: 295 5005
"TTAATTACTG"
Length: 10
The Start Pos: 2171 4247
"TTATGATATG"
Length: 10
The Start Pos: 595 5110
"TTCACCCGTT"
Length: 10
The Start Pos: 1530 3188
"TTCAGCGATAAG"
Length: 12
The Start Pos: 3756
"TTCAGGGGAT"
Length: 10
The Start Pos: 5870 7155
"TTCTAGCCAA"
Length: 10
The Start Pos: 75 1073
"TTGTATCAGC"
Length: 10
The Start Pos: 1234 1539
"TTTCAGGGGA"
Length: 10
The Start Pos: 5869 7154
Put a newline at the end of the file.
... Or you could always apply this patch (generated against release 3.3.3,
so may need a little fuzz to fit into whatever version you're using), and
then use the flag -Wno-eof-newline to disable the warning....
The Same String in the Source String:
"ACTGTGCAAC"
Length: 10
The Start Pos: 1936 2275
"AGAAAGACGG"
Length: 10
The Start Pos: 5288 6068
"AGAAGTGAAA"
Length: 10
The Start Pos: 8578 9238
"AGAATGTGAC"
Length: 10
The Start Pos: 8275 8559
"AGTCCCAGCA"
Length: 10
The Start Pos: 5036 6622
"AGTTTTTACA"
Length: 10
The Start Pos: 1178 4526
"CAAGAATCTT"
Length: 10
The Start Pos: 1036 2537
"CAATACTGTAA"
Length: 11
The Start Pos: 4996 5213
"CAGTTGACCA"
Length: 10
The Start Pos: 2047 4579
"CATTAGTTCC"
Length: 10
The Start Pos: 533 2793
"CCCTGTAAGT"
Length: 10
The Start Pos: 2441 4074
"CTAGTGCCCA"
Length: 10
The Start Pos: 7676 8264
"CTCGTGAACA"
Length: 10
The Start Pos: 291 4039
"CTGGAAGCGC"
Length: 10
The Start Pos: 5577 6157
"CTGGAGAACG"
Length: 10
The Start Pos: 2159 5338
"CTGGATACGGC"
Length: 11
The Start Pos: 684 8492
"GCGAAGTTGG"
Length: 10
The Start Pos: 1106 1406
"GCGTTTATCT"
Length: 10
The Start Pos: 4614 7139
"GGGCCCGCCT"
Length: 10
The Start Pos: 406 5890
"GGTTGAAAAC"
Length: 10
The Start Pos: 6122 8812
"GTAAACAGCG"
Length: 10
The Start Pos: 5003 7599
"GTACTCTGCC"
Length: 10
The Start Pos: 7920 8022
"GTAGTAGGGG"
Length: 10
The Start Pos: 3142 4853
"GTGCTACTGA"
Length: 10
The Start Pos: 4457 9701
"GTGGCTGGCG"
Length: 10
The Start Pos: 2211 7080
"TCCGTCTCCG"
Length: 10
The Start Pos: 6790 9434
"TCTATCCGTC"
Length: 10
The Start Pos: 3350 6786
"TGAATCATAC"
Length: 10
The Start Pos: 2391 8115
"TGGACTCATCG"
Length: 11
The Start Pos: 8729 8952
"TTGCTGGTCT"
Length: 10
The Start Pos: 3828 9511
Source String:
ATAGCTACTACGAAGGGGCGGTTCTGGCGCAGGGAATGTATTAGCATCGCGAAACAATAGTTGGCCAATGTGCAGGCTAGCCACTGCTAGAAAAAAATGTTTAACTTTAAATCATATAGCGTTCTACAGCAACCACTTGAGAAGCTGCATCTGCCTGGCACTGTTACCTCACATTGCGCATTGTACCCAAGGACCCTTTACCAGACGAACGGGAAACTTACACCTTACATTTCTTATGCTATGGGCATTTCTATCGGACTGAACTTCGCCTTCGAATACGTAAGTCATCCCTCGTGAACATAAACAGCGTAAGATGCGGTATGGACATGATTTCGTCGAACTACGCACGCATTTATCAGTTCCGTCTTCATCGAGGCATGGAGCTATTAAGGCAACCTTCCCTCTGGGCCCGCCTAGCCCTCTCGATCTCTTGTTTGATCATGTACAACCCCTTGTATTCACGTGAATGTCAAACACTAGCCCTAGGACTTCAACCTCCGTAGGTGGCCCACACGATCACACCGGGATTGTTCATTAGTTCCTCACTTGGACCCCGATGTATCGTGTTTAGCTAGTACAAGTCAATAACTACATATTCATAATTAGCGTATAACAGATGCCCCTCAATCAGGTCTGCTTGTTCGGAAATAGTAGGTGAACTAGCACCCCTAATGCGGTGCCTCCTGGATACGGCATTGTATCGGTCCCTACGACTTACGTATGAGTTAACGACTGTTACGCCGGCTGGATGTCTAGTGTATCAAGTTATGGCTTACAACCTTCAGGGCGCGACACAGCTTTAAGTGATGTGCGGAGTGCTCCCTAAAACTGTTTGGACGCCTGATCGGACCTGGTCCTGCTACCTACAGGCGGCAAACATCTCCTAGTCTTGATATTGACCTGGGCACTCCGGGCGGTGAGCGAACGTAAAACCAGATACGAGCAATACGAAGATGTGTCACCCTAACTGGGACTGCTAGGTAAAGAACGCTAGGACACTCAGTCCTCTCTCAACTACGGTAGTCTTGAAACCTCCAAGAATCTTGACCGTCTGGCGAGGCCCCTATTTGGCGACTTAATCCTAGGGTGCAAGTTGGGTCCGAAAGCGAAGTTGGTCCCTTCACCGTCATCTGGTTCTCTGAGATATTTTCAACTAAGAAAGTGGAAATATGCCTTCAGTTTTTACACCTTGGGGCGCGGCCATCTCGCTATTTCTCCCCTATATACCACTGTGTTGGCATGCAGCTAGCGTCGCACAGCCATTCGCCCGTCAAACTGATTAATCCAACCCCCCACTTCCAGGGGAGGAGCGATTGGCCATCTCCCTACTCAATTGGCGTACGAGTGGAGAGAGAGTAGCTTCCGCCGCACCTGGGTCGGTTAAGCTTCGCTCTGAAACACCTTCGCGAAGTTGGGTGCAGTATAGAGAATGAATGGCCTCTTTACCGCCCGCTGTGGGGCAGTCCCATTTCTCCCGAGGGCATACTCACCAGGTATTGAGAGCATTCGCCGCACGGGACTAACCACAAAGTAATGGCCAGTCAGACATGCCCTAACTCTTGATGGCATTCCTCGGAGATGCCTAGACGGGGAGCTGTGACCAAGACTCGCGCGCTTACAATCATTATGTGCTGCGCAGAATCGTTATCTTGTACCATGAAGGTATCGGCAACTTGCGTCACCGTTTCTATCGAGGGATAAGATAAAGCACATTACACTCATGGGACTACCCGCCCGTTGACTCGATGTCAATAATGTAGGGTAGAATCACTGACTCGTCGGTTATACCCGCAGGCTACAGTCAATGGATTGGTTATATATGAGGGGCCCTACTACAATGTCAGAGCCCGAGAGGAATGGAGGATCGTTATAGGGAGCAGAGACAATTGGTTACTTGCGCGTTTGGAAGACCGAGATGTACGACCAGCATCTACGACTGTGCAACATACGTGTGGGTAGTTTGGACCCATGAATATTGTTGCTGCCCCGCAACCGCACTAAACTTCCATCCTGGGGAAAGTTAACAAATTCCTAGATAACAGGAGTCAGTTGACCATCCACATGTAGATACCACCAAGAACGAAGCTTGACTCAACGTATCCTCGGTCGCGAACAAGCTCACTACGCCGTGTAGTAGCTTGGTGGTGTGGCGAAAAAACTGGAGAACGAGGAGCACCCTAATATGATCTACGCCATGAATAAAGAACGATGTGGCTGGCGATCGCATGAAAAGAACGCCATCCCGCTTAGCTACAACTTATATAGGGTTGCTGTACTGTGCAACTCGTCCAATGGACGAGTCATCAATCTCATAGTGCATCAGTCATTCTTCCTTCACCAAAACTCGGTTGACGATTAGCTTGCTTGATTGTATAGGCAATATTCAGGGATGAATCATACGTAACTGGGGCTTAATGGCATCCTGTGCACCTCAGTTTGCCCCTGTAAGTCTCGATTAACAGGGCCCCGATACCTGTAATGCCCTCCGCGCGGGTGACCCGGGGAAAGACATTGGCTAAAGTATTCTATCCTTATTCAAGAATCTTTAAAAGCCCCTGTTTCCAGACATGGTTGCTAACCGCGAGCGGTTTTTCTTCAATAGGACTATACTACTCAATAACGCCGCACGCCTTACCCGCACCGGCCGACTGTTAAAGATATTACAATGGCTTGGGCGCATAAGCTCGAGTCGGGATTCTACGTTTGGAACAGCAAATAACCGCGCGGAACGGTGAATGTATGGTCCCACTCACCCCGGCCTGTCACATACATTACCGACTTAGACACGATGACATTAGTTCCGACGGCCCTTAAGGCGGCTCCTACGTATTAGAATTGACGTATATGCCAATGACGCATGGGCGTATCGGAAGTTATCCCGTGTTAGACAGGTATGCTTCTTCGTGCCCCAGGACCCAGCGGCGACCTGCTTACAATGGGGATCGGACCCGCGGCGACTCTATATTCCAACTGTCCGAAACTCATACGGTACAGATTTTCATGCCATGTCCCGGATTTTTACAGGCGATTGCCCGTATACTGGCAGGGCCTCCCAATTCACAAACACCCGCCATTGTTCTAAGGCAAACCTCAGGGTCGACTCGAAGGAACAGCTAACTCTGTGACTTTAGTCTCCCCACCGTAGTAGGGGTAATACACACCCGTGCCGCACACTAAAGCTCATGCTCTCAACTGGCAAGGCTATCATATTATAGAAGCAGGCATTCGCCTCTCAGACTCGTTTGCTGGTGGGAATTGTGCACCACCCGTTGGCGGAAAGTAACTTACCGATAGTGCCGTCCCTCCTGCTCGCATTGGCGCTTAGTAAGAATAACTCGGGGCCCTCTACTCTATCCGTCATTTGCAGTTGAGCCCAGCTGGACTAACTGGCCATGTTAACATCTAAACACTTTGAGTTAACACACCGGGGCTGGAACGTTGCATGCACTACTTACTTAGTGAGCTTGCGTCGAAGACACTGTCCCTACATTTTTGCCGGTCGTTTAAAGCAAGAAACCTTCCCCGTTCGGCCAAACCTTTAGCCCGCTAGCTTTTGCACAACGTGGCAAGGTGATCGTCGGCTAACCACCCTCCGGTGGCGCGGGTTAACTGAAAGGTCCTAGCATCATAGTCAGTTTATCCAGACCGACACCTGGCTGAGGGGTGTTCATAAGTGACGGCGCGCTGTAACTAGCTGCCCACAGCGCTGAGCTTCGCTAGCAATCCACCTTCATGGATGAACCACATCTCAGACCTCATAACGTGGCGCGCCCGCCTCCCGTGACAATAACCTGCAAGGCCTTTACTTCGTACTAGACGAGAGGGACTTGCTGGTCTTGTTGGCGTCTGAGCAGTGCCCATACCCTCTCAAACGGCATTTCACTAAAGATGACGTGGCTCGTCGTTGTTCTTGGTAAAAATAAGTCGTGCTAGGGCCGTTTTAGGGTGGTATGAGGTCTGTTCTCTTGTGACATGATAGCAATTCCACAGGCTTGTTTTAGGGGGTTGTATTACGCCCTGCTGTCCGTAGTTACCCCACTCGTGAACAAGTCACACAAATCAGAGAATGCATACCCTGTAAGTTAATGGCCTACCGCTTGTACTGGGCCTAGTGATCAAGGAAAGATCAGTACTAGGCGCGTGCTCGCTCGGACGCAGCGGCACCTAAAGACGGGCTCACGTGCAAGGATTACTCTCACCACCCCTTTGAGAGCTTGCCAATATACTATGCGGCAGTTTGGTCCTAGAGTTGGGAAGTGAGCCGGTGTAGGGGGCCCATCATTAATCGCCCCCAAGATGCAACAGCCGACCCACCTCTACATACGAAAACCGTCTGGAGGATCCGCGCCGACTAATACCTTCTCGAGGGGACGCGTTGGCAATTGCCATTAATGTTTTTAGAGGAACCCGCACCCGGTGTTGCCCGCGAACTGGACAACTCTCAGATTCTAAACTCGTGCTACTGATGACTTAGCATTCAAAGTGCTAAGGAGGGATGTGGTTTATTCGAGTTCCCCGATCGATTAGTTTTTACATTCTCAAGATCCCCATCACCTAACTTAAAACGCAGTGCGTCGCCAGTTGACCAATAAACCGCCGTGAAACATAGCCACGCGTTTATCTTAGTTTGAATACGCAGCCCGTATCGATTCCGCTCTCGGTACGGTATACGCCACCAGCTCGGGGTAGTAAGGACTGCGCCTCCAGAGGAAACAAGAATGAAATCTAGTCCCGCGATCGCCCGTGACCCCTTAAGCCTGTTTACGTTAGCTAAGTAGTATCCAAACGAGAAGAATAAAGACCCCCGTTTAGAACGCTGCAAACTTAGCGCGAACGTGTTACACAGCGGTAGGTAGTAGGGGCAAGGCGACGAGTAAGATTGGCAGGCAGTGGCCTACATGTCCCCATGGTGAGCTTGGCTTGTCGCTATGTTGTGACTTAACGCCATATATCCAATTAAACCCCTGTGTGTAATTGTCCCCTCACCCAGGTCGACAATACTGTAAACAGCGAATGGTAGAACTCAAAAGCTTCGAGTCCCAGCATCGCGCATCGACCCCCCCGTATCAACGTTAGCACCTACGGCATGGGCTCGAAGAGCAGAACGGGCGATTACTTATGTTTTCCGACTTCTCAGTCACTCTCACTTCCATACTTGGTGGACACACTACACCCACTCGTCCTACACACATGCAAGGTTCGCGGCTCGAAGCAATACTGTAATACTGTCCTTTCTACGCGGACTGTACCTCTTCCAAACTTTAAAGGTGTCCGACTGGTGCAGTAAAGAAAGACGGCAATAGTCCGGTGAGTCCCCCACGGTTAAGCAGGACTGGTCTGGAGAACGTATGGAGTTCTGGGTCGTTATTGGTACTGCAAATGCGATTTGAGAATCGGTTACGAACCAGGATGAGGCTTCACAGAGTACTAAAGTAGCACTTTGTGTCATAGAATCCATCGAGGTAGTAACCGAGGGCACGGTCCCCATTTGCTCGGCAAGGCACCAGACTTTGATGGGGGCTATGCAATGAATATGTTTTTGGACTTACGATTGGTTCTCATTTTGGCACCCCGTACTGGAAGCGCCCCTTGTCAGGTAACTTGAGCTAGTAGCTAAAACACGGTGACCTGCTCCCAACGCGTTTCTCTTAATACCGAGCGCAACTAGGGTAAAGGTAATCGACAATTTATCGGCTGCACACGCCATCGCTGAACCGAGCGCGGGGCAAGATACTCACGAAGTTTTTTCCGGTCAGACCGGTCTAGCCCCCACCGCGCCTGAATGGCTGTCGTATCGTCAAGCACTAAATACTTATTCAATTATCCCCCGGTCCCCAAGATATGCGCGGGCGGGTCTGTAACCCAGTCTAAGGGTCATTATCGACACCCGGGCCCGCCTGAGCCCTTTCAAGTGAGTCTCTTGCTGTGCCGAATCGACAGGTGTAAGCACCACTGTGATGGTAGATGAAGCACGCTAAGTATGTATTTGGACGAATGGCTTGACACGTCGATTCGAGGAGTCTTTCTGAACTGTGTCGAGCGCGCCGGAGGGCCTCTCAACAACCTCAGAAAGACGGAGCTCCGCGCGATATAAGACACTCGGGGTTGTGCGCAATCAAGAGGTTGAAAACCCCCTAATGGGAAATTTGGCCGTTGCTGGAAGCGCGTTATTACGTGGTCCGAAGGTGGTTCATATAGATCACAGAAACCACAGATATCGGCACTTGATTACCTCCGCGGATCACTGAAGGTCTATTGAGTGGATAATGATCGCTCGCCCGGCTTTAGTCTAACCCTGCCCTACCTGACTGCGATCAGTGTATTGCCGCGGTGAATACCGAGAAAATGAAGTGTTCCCTGGAAGTCATTTACCTATAGGAAACGGGACGTTCAACCTATTGAAATACCGGTGGATGGACTTCAAACCAGGGCCGCAGGCTTTTATGCATGTACCTGTCTTAGAAATGTGACTCTGTATGTTAGTCCTTTGATCGGATAAATATTCATGTTTTTGCACGCACCATCAAGTAGACATTATTCGGTATACATTCAGGGAACCCGCCCACACCTCCATTTACCGGGGTCAGCGCGGCCCGATTTTGGCAATTAGGGCCCGCCGGTAGTCCCAGCACAGGCATGACTTCGCCAATTTACGGACCCACGCTCACTCATCACCAGTCCTTCTGCTACGATCCCACAGCGGTGCTCTAGGACGTCCAATCACAGGAGGTTGCAAGCGGTGTGCCACTTTTTAGCCACTTGCCGACTTGGGAGCCAGAATACTTTCTATCCGTCTCCGGAGGAGGCTCGGTTTTCTGCTATTGATACTCTACGTCCTTCATTGGGAATCCGATGGTACTAATGGCATGCGGCGTGATCATACTGATGAGCGCCGGTCGATTTTATCGATCCTGGGCGGCTTACAGCTAAAACTCGGCGGAGCATCATCCGTCGCAGCAGCATCCTCCCGCGTAAATTGTGCTCAGAAATGCGTGCCAGAGATTCATGTCCCACGGCAGAAAATTCGATGCCGAAAGAGCTGAGTGTGCATAAACCAATACAGGTGTGGGACAAACTCCGTGGCTGGCGGATCAATTCATGTGGACCGATCCGCAGAATGTCGCGATGCTAGAGTCAGGCGTTTATCTCATACTAATCACCGTACCACCTTTGAGCAGACCTTAGCTGGTTTGTATGGAGGACTGCGGCTCTAAGCAGTGGGACGGAGCGACGCACATAGCGCCGCATCATCTCCGATGCTCACTCTGGGACTGATTATTTCTGCGCGACTCCTTGACTCGGCATCCGTTCTCGACATGCGCTCGACAACAGGGAGCCCCGAGGTTGGCCAAGACCCTAACCCGAAAAGTTCAATCCAAGGACGAAGCAGCCGCGATGGCATAAGGAGTCTAATGACCCGGTTATAAATCGCAAAAGAAACCGATACATCTATGAAAATGACTAGAGCCATATCGTACGCTGTCTCACCTGTGCCTTCTCATTACCTACTCAGGGTGTATTGGGCTCCCAGGCCCGTCATAGCTGTTCATATAAGACCTGGCTAGGAGGCGCAGGCCCTACGTACTTAGAGTAAACGCGTAAACAGCGCGGGCAGGTCCGAATTCCTTATTCGTGTGTGGAATCACAGATTGTCGGATAGGCAACGGACCAGCCTCTAGTGCCCAGGCCTAATCCCTCGTCAAGATCAATTCTTAGCCTGTCTTTTCTGAGAGACACTGAAAACCCACTAGAACAGGCCGCGTTACAGAACCAACGTGATTATATGTGACAGTCCCTGATCATCTTCAATCTTACAAGGCCCTAGTTCTAGAACAGAAAAGATCCGCTTTGGCTCACCCAATGTCATACCTCACCGAAGGCGGGATTAAGCAGTTTACACCAGTGGTCAACTATATTTCGTACTCTGCCACTTGCGAGGGAAGACGAAAGGCGCCCGCGTACTCCTGTGTGCGAACAGTCATTGCTGCTTAAGTCAGGACATGTCGTAGACTATTTAATTGGTACTCTGCCCGTTTCACTCCCCGGTGTACTCGAATCTGCGGCCGAGGATGTGATATGTTCAGGTTTGCACTGGATGGTCAATGGCGCACCGCTGAATCATACCCCTCCCGACCAAACCATAAGGTGGGATAGGCCTCAAAATAGCCATTATGGACGTTGGAACTGGAATGGTTTCCGACAATCGTGCGATTCAACTGGTTGCCGTCAAATCTATGGACACGCCGTTTTTTACAGTCACTAACTAGTGCCCACAGAATGTGACCGCTGTCAACGTTTTACGCGAAGCAGAACTTGAGGTCCTCCTCACTGCTGCCTTCGCCTGCCAATCTTATGAAGGACCAGACCGTGCCTCATTCATAGTCTCCGACATTCATATATTCCTGGGTTTTCCGTAGTAGACAGATTCAAGGTAGTTACGGTAACCGCCAAAGGGAGTATGTAGTGTCAGTCGAATATCTAGCACACCGAACTGGATACGGCGTAAGTGAAGTTGTCTGGCATAATTGTGACGCGACCTTTTGGGAACGGAGCCATTTAGAATGTGACTATTTGTACAGAAGTGAAAGGCCAAGTGAATCACCTGTCTCCCGTTCAAAGGATATATTATGAAGACAATCTACCGACAGTTTGATATCATATGTTTAACCCGCTCCACAATATGGAGTACCCTTGTCGGGGGTCATCCCTAGAGGATTAGTGGTACTCGTGGACTCATCGATACATGCACCAGGTCTAAATCACGATAGGCACTCTTGTACGCGATTAAATCGGATACCGCGAGCATCGTTGGGTTGAAAACGAGCCTTGCACGTCTTGCTGAGACAAGCAAGTGAAGCCGCCGTCGCTTCTATACGTAATCAATGTCGGGGCATTGCCAGTGTCGCGCATAGGAGGTCATTCGCGCAGCACTCTTCAAGTCCAGTTATAACTGGACTCATCGGGACGATACGACTTAGGGGTTTTACCGAAGCCAAAAGGATGGGCCCCGCAGTTGTCAGCCAGCTGTCTAAATCCATACTCAAGAAAATGGTGTAGCCGTATCCCTCAACGATAGAGTGATGAACCGGGAGGCTCCCAGATTCCGCTGCATCCCCCTTGGAACGTGCGTTGTGAAGCGGGAGTCATTAACATCAGTGGAGGCTATACATATTCGTGGTCGAGGTGGCGGTGAAGACTAACCGGCGTGGCGTTGTGATTTCGGATTCGCGATTTCATAGAAGTGAAAAAACTCTTCCAGAAATAAACATGGGACTCTAAGCCACTGCCCTGGGCGTGGCTAGATGGCAGTTGGCTGCCAATTCCTAAGATCAGGTTGAGATCGAGCTCCAGTTGGTTGGTGAGTGAAGACGAACCGTCAGGCACCGGGGTTCACAAGCTCATAATCTGACAGTGCTCAAGATACCGGATATTGTCCGTCTCCGAGATTTACGGGCGTTCGCACCGAAGAAGGGAAAGTGTACCGGTCACTATCAACCACTCATGTATTTTTTGCTGGTCTATACCGCCCCGACTTTATGTCAGGTGTTATAAGTCGTATCTCCGAAACCGGTACGACCTAGATGATGGCGTTAGGAGATCTAGAAAACCGACGCCGGTTTCGCCTAGTTAGGATTACTTGACCTGCTGCGGTCTATTCCGTCCACCAGTTACCGTCCTCATCACTAATGTGCTGTATTGGGTGCTACTGAGAGAAGGGGTTAGACGCCACCCTCGTAACCTAGAATACGCGGCGAAGAGTGTAGATACTGTATTGGTGAGTCACGGCATCTTTCAGTGTGAGCTGAGGCACGCCGTCTAAGCAACGATGGTTTTATACGGATTTAGTAGTGCCAAGTTCTTTGTATGTCAAATCCAACCACATGGGAGCAGCACCGTATTCGTTGTTGCAAAGTGTTGTGTCAGTTTAATTTCTCACCTTTTGTGATCTCGCCGTCTGGGTACATTGGAGGGTAAGTTTGGTAATGGGAGGTCTCCACTC
"AGACAATTGG"
Length: 10
The Start Pos: 4447 45520 76827 45520 76827
"AGACCGACAC"
Length: 10
The Start Pos: 11512 32035 78589 32035 78589
"GCGCGTTTGG"
Length: 10
The Start Pos: 44143 52326 76843
这是我要到的一个例子的标准答案,如果这个能运行出来的话,那就证明你的程序没有问题了。
The Same String in the Source String:
"ACAGGCCGTGCAGAGACTGACGATCAG"
Length: 27
The Start Pos: 11 46
Source String:
ACGTGCGATCACAGGCCGTGCAGAGACTGACGATCAGACGACGTGACAGGCCGTGCAGAGACTGACGATCAG
而且应该在总字符串长度10000以内都不会发现有错,这个程序还运行的相当的好,
我用 awk 写了一个, 想对比一下与 C++ 处理速度上
的差距. 请去这里:
下载 10000 个字符的数据文件.(为显示我作了fold 处理, 你在把它恢复成单行), 然后查找所有不重叠的串长为 10 - 30 的重复串, 计算一下时间.
主任的水平确实让人敬佩,但我觉得主任可能没有考虑延伸的问题:如AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
这个序列没有>;10的重复,如果有的话也只能是本身,如果不延伸的话,它会有很多个重复,从10开始以1个步长移动。呵呵,一家之言,也可能题目给的不清楚。
对于一个已知的基因序列片段如何在数据库中找出与之相近的记录,并标识出该片段
在记录中的位置,是Blast中最关键的部分。由于Blast数据库中含有成千上万条基因
序列记录,而每一条序列记录又含有成千上万个碱基,这些碱基以一定的编码方式存
储在数据库文件之中,所以这种检索操作也是Blast中最为耗时的操作。
在Blast进行检索时,首先需要将数据库名称、需检索的基因序列数据以及以何种方
式进行检索通知Blast,在获得这些参数之后,Blast将需检索的序列数据读人一个
Bioseq的结构中,将数据库中所有的记录采用MAP方式映象到内存中,然后从数据库
的第一条记录开始到最后一条记录逐条进行比较。在这种比较的过程中,Blast采用
了一种窗口算法来进行这种匹配。这种算法描述如下:
第一步,对匹配基因序列数据进行计算。我们假定匹配序列如下(采用核酸基因序列):
>;test
AATAATAAAATAGGGCGTGC
则对应的核酸基因序列编码(Encode)如下:
00 00 11 00 00 11 00 00 00 00 11 00 10 10 10 01 10 11 10 01
定义一个16比特大小的窗口(Window),16比特大小表示从窗口来看一个二进制串,
只能看到该串的一个连续的16bits子串。窗口可以在二进制串上左右移动,每次移
动步长为1byte,窗口的值为二进制子串的值,窗口的位置为窗口最后一位在二进
制串上的位置。对于一个确定的二进制串和窗口来说,窗口的每一个位置都对应一
个16bits的二进制整数值。下表计算出Encode上Window的所有窗口位置和相应的窗口值:
Position of Window Value of Window Binary String of Window
8 3120 0000110000110000
9 12480 0011000011000000
10 49920 1100001100000000
11 3075 0000110000000011
12 12300 0011000000001100
13 49202 1100000000110010
14 202 0000000011001010
15 810 0000001100101010
16 3241 0000110010101001
17 12966 0011001010100110
18 51867 1100101010011011
19 10862 0010101001101110
20 43449 1010100110111001
第二步,对数据库中的每一个记录序列数据定义同样的窗口,在匹配时顺序计算其
窗口值,如果数据库序列数据的某一窗口值与匹配序列的窗口值列表中的某一项相
等,则比较序列数据的前后部分,如果前后都相等,则找出了一个符合条件的子串,
否则窗口在数据库序列数据中后移16bits,再顺序计算其窗口值。假定数据库的一
个记录序列数据(核酸序列数据)如下:
>;Database Sequence
ACTTTCAATAATAAAATAGGGCGTGCAACT
则对应的核酸基因序列编码(Encodebase)如下:
00 01 11 11 11 01 00 00 11 00 00 11 00 00 00 00
11 00 10 10 10 01 10 11 10 01 00 00 01 00 00 01
定义如同匹配串的窗口(windowbase),开始计算窗口在位置8时的窗口值为2036,
没有一个匹配串的窗口值与之相等;windowbase窗口下移16bits,再计算位置为16的
窗口值为12480,他和匹配串窗口位置9的窗口值相等,这时比较前后剩下的序列数据
发现该数据库序列数据包含一个与待匹配序列完全相同的序列片段。如此类推直到找
出所有满足条件的记录。
具体你可以看一下http://www.bioinfo.org.cn/computer/blastand.htm
我把字符串当成两个一样的串来考虑,一个串固定,另一个串划动(象拉链一样),每一个次都取最大匹配串。。。
我的程序里串长是随机的,而且出来的结果很多,不是按照重复的排列,所以不好找
原帖由 "THEBEST" 发表:
我不太明白你的意思,我不懂AWK,我用C++写的在处理上没有优化,方法也很臭.所以我认为写的很不好.而且还不正确(有BUG的).
引用:
下载 10000 个字符的数据文件.(为显示我作了fold 处理, 你在把它恢复成单行), 然后查找所有不重叠的串长为 10 - 30 的重复串, 计算一下时间.
为什么是10-30?大于10就是了.(题目如此要求的),所以不重叠?我处理的是考虑了重叠方式,如果不重叠会处理的更少情况,所以速度也有所改善.你顺便处理一下10*10000个源字符串并最小长度为10(没有最大长度)的重复字符串,看看速度.
你可以直接运行我的程序,改一下存放文件的C:\\result.txt就可以了.我现在没有LINUX/UNIX,执行不了你的程序.
你的程序完全正确吗?我觉得很难验证.icon_smile.gif
原帖由 "lightspeed" 发表:
用 C 果真快很多。 但如果你从文件读数据, 应该会慢一点点。
另外, awk 的串操作较慢, 我准备放弃 substr 语句, 相信
会有大的改善。
欢迎光临 Chinaunix (http://bbs.chinaunix.net/) | Powered by Discuz! X3.2 |