- 论坛徽章:
- 1
|
本帖最后由 咏咏672418539 于 2015-12-07 10:00 编辑
文件 mydata(由ATCG四种元素构成)
〉1
ATGTCTAAANGTTCCTACTATTTTGAACCCTACTGANANGANAGACCTTCAACTCTTATT
〉2
GCCAAGAACCTCAACACTTGATTAACCTTGG
〉3
CAAGACGTGGGAAAAGCTCATCTTTGCTGCTATTGTGGTTGTC
〉4
TCTGCTCGTCCCTACGGCCACCGTGCCGCCTT
〉5
CTTCACGCCAGGTACGTTTACCAATTACATCAGCTGAC
程序为:
#!/usr/bin/python2
# coding: utf-8
from collections import defaultdict as DICK
DATA = 'mydata' # 也就是 文件名
F = open(DATA)
U = 6 # 六联体
V = [0.25**n for n in xrange(U + 1)]
N = 'AGCT'
for line in F:
print line,
seq = F.next()
dic = DICK(int)
for i in xrange(len(seq) - U):
sub = seq[i:i + U]
num = sub.count('N')
if num is 0:
dic[sub] += 1
continue
L = ['']
for c in sub:
if c is 'N': L = [e + C for e in L for C in N]
else : L = [e + c for e in L]
for e in L: dic[e] += V[num]
for k, v in sorted(dic.items()): print k, v
tot = float(i + 1)
for k, v in sorted(dic.items()):
print '%s\t%f\t%f' % (k, v, v / tot)
结果为:
AGCTAG 1 0.500000
GCTAGA 2 0.125000
GCTAGC 1 0.125000
GCTAGG 1 0.125000
GCTAGT 3 0.125000
……
想要结果出现在一个我指定的文件里,比如名叫3.py的里面,要把命令加在哪里?用不用在那个盘下创建一个3.py? |
|