平台论坛博客文库

› 论坛 › 程序设计 › Python › Python文档中心 › 使用python提取人类基因组片段

使用python提取人类基因组片段 [复制链接]

blackjimmy

丰衣足食

论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2009-04-04 14:49 |只看该作者 |倒序浏览

参考论坛讨论：
http://bbs.chinaunix.net/viewthread.php?tid=1370441

根据大家的讨论，我改了下程序，运行的时间大大缩短。原来提取一条序列大概1秒，现在提取10万条也要不了1分钟。呵呵！看来有时候python也是相当的快啊。只要用好了函数和有好的思想。
下面奉上我的程序，请指教：

import os
def get_oneseq(chr, start, end): # chr1, 23, 65
f = open(chr+'.fa', 'r')
head = f.readline() # >chr1\n
firstseqline = f.readline() # taaccctaaccctaaccctaaccctaaccctaaccctaaccctaacccta\n
offset = len(head) # 6
linelen = len(firstseqline) # 51
startpos = offset + start % (linelen-1) + (start/(linelen-1))*linelen-1
endpos = offset + end % (linelen-1) + (end/(linelen-1))*linelen-1
f.seek(startpos)
seq = f.read(endpos-startpos+1)
sequence = seq.replace(os.linesep,'')
return sequence
def parsebed(file,header = 0):
f3 = open(bedfile,'r')
f4 = open(file+'_seq.txt','w')
if header == 1:
      newlines = f3.read().split(os.linesep)[1:-1]
else:
      newlines = f3.read().split(os.linesep)[:-1]
seq = []
for i,c in enumerate(newlines):
      chr = c.split()[0]
      start = c.split()[1]
      end = c.split()[2]
      name = i+1
      sequence = get_oneseq(chr, int(start), int(end))
      print >>f4, '>'+str(name)
      print >>f4, sequence
f3.close()
f4.close()
if __name__ == '__main__':
file = 'test.txt'
parsebed(file,1)

本文来自ChinaUnix博客，如果查看原文请点：http://blog.chinaunix.net/u1/33851/showart_1890193.html

文库|博客

返回列表

Chinaunix › 论坛 › 程序设计 › Python › Python文档中心 › 使用python提取人类基因组片段

使用python提取人类基因组片段 [复制链接]

浏览过的版块