据说会python的人一般精通perl

huang6894 发表于 2014-01-16 21:51

额，不好意思啊~我一学长给了我一个程序，可我一点都不懂python，只能大概明白它的意思，如果有大神愿意感兴趣的话，能不能试着把下面的这个python程序改成兼容性更好的perl呢，坐等大神出现！！！
p.s:也在perl区提出了，求关注~
下面是代码：#usage: python cds_abs.py input_file output_file

# the input file contains 2 columns
#eg;
#NM_000497.0 c.595+16G>T
#
import re
import sys
def usage():
   print '''
   python cds_abs.py input_file output_file
   '''
if len(sys.argv)!=3:
   sys.exit(usage())
import gzip
fin=sys.argv
fout=sys.argv
f=open(fin,'r')
o=open(fout,'w')
######
def position(nm,ps):
   global ore
   cds=gzip.open(‘hg19.dbref.gz')
#    cds=open('hg19.dbref.sort')
   ps = int(ps)
   while 1:
            line=cds.readline()
            if not line:
                     break
            nm1=re.search("NM_(\d+)",line.split('\t'))
            if not nm1:
                     continue
            else:
                     flag=line.split('\t')
                     global chr
                     chr = line.split('\t')
            nm1=nm1.group(1)
            if nm==nm1:
                     while 1:
                           pos=cds.readline()
                           if not re.match("^C\d+",pos):
                                    continue
                           else:
                                    cdspos=pos.split('\t')
                                    if flag == '+':
                                             ore = '+'
                                             for num in range(int(cdspos),int(cdspos)+1):
                                                   ps-=1
                                                   if ps==0:
                                                            return (num,flag)
                                    else:
                                             ore = '-'
                                             for num in range(int(cdspos),int(cdspos)-1,-1):
                                                   ps-=1
                                                   if ps==0:
                                                            return (num,flag)
   cds.close()
#########
def result(nm,ps):
   fu = re.search("^-(\d+)",ps)
   if fu:
#             print fu.group(1)
            (res,flag)=position(nm,'1')
            if ore == '+':
                     return (res-int(fu.group(1)))
            else:
                     return (res+int(fu.group(1)))
   he=re.search("(\d+)\-(\d+)",ps)
   if he:
            (res,flag)=position(nm,he.group(1))
            if flag == '+':
                     return (res-int(he.group(2)))
            else:
                     return (res+int(he.group(2)))
   else:
            hehe=re.search("(\d+)\+(\d+)",ps)
            if hehe:
                     (res,flag)=position(nm,hehe.group(1))
                     if flag == '+':
                           return res+int(hehe.group(2)),
                     else:
                           return res-int(hehe.group(2))
            else:
                     (res,flag)=position(nm,ps)
                     return res
#res=position("000015",20)
#print res
####
#res=result("014020","205")
#print res
###

########### for the finding of mutation in 3'UTR
def utr3(nm, ps):
   global ore
   ps=int(ps)
   cds=gzip.open('hg19.dbref.gz')
   while 1:
            line=cds.readline()
            if not line:
                     break
            nm1=re.search("NM_(\d+)",line.split('\t'))
            if not nm1:
                     continue
            nm1 = nm1.group(1)
            if nm==nm1:
                     global chr
                     chr = line.split('\t')
                     flag = line.split('\t')
                     ut1 = line.split('\t')
                     ut2 = line.split('\t')
                     while 1:
                           pos = cds.readline()
                           if not re.match("^3U\d+",pos):
                                    continue
                           utrpos = pos.split('\t')
                           if flag == '+':
                                    ore = '+'
                                    for num in range(int(utrpos),int(utrpos)+1):
                                             ps -= 1
                                             if ps == 0:
                                                   return num
                           else:
                                    ore = '-'
                                    for num in range(int(utrpos), int(utrpos)-1,-1):
                                             ps -= 1
                                             if ps == 0:
                                                   return num

for line in f.readlines():
   word=line.split('\t')
   nm=re.search("NM_(\d+)",word)
   if nm:
            nm=nm.group(1)
   else:
            o.write("-\t-\t-\t-\t-\n")
            continue
   mut=re.search("c\.(+)_(+)",word)
   if mut:
            res=str(result(nm,mut.group(1)))
            res+='..'+str(result(nm,mut.group(2)))
   else:
            mut=re.search("c\.(+)",word)
            if mut:
                     res=result(nm,mut.group(1))
            else:
                     mut=re.search("c\.(\d+)",word)
                     if mut:
                           res=positon(nm,mut.group(1))
                     else:
                           ##### c.*111A>G : the 3'UTR region
                           utr = re.search("c\.\*(\d+)",word)
                           if utr:
                                    res = utr3(nm, utr.group(1))
                           else:
                                    res="ERR"
   o.write(chr+"\t"+str(res)+'\t'+ore+"\t"+line)
#             delpos=re.search("(\d+)_(\d+)",word)
#             delpos=delpos.group(1)
#             a=delpos.group(1)
#             b=delpos.group(2)
#             res=position(nm,a)
#             res+=position(nm,b)
#    print res
据说这个程序的逻辑是比较差的，遇到错误不能跳过，效率上也不怎么好。不过实现起来非常好用~
大神们，来吧来吧，嘻嘻，谢谢

remyzane 发表于 2014-01-16 22:05

本帖最后由 remyzane 于 2014-01-16 22:41 编辑

> 据说会python的人一般精通perl

楼主这句式风骚啊！我也来一个：

据说会炒菜的人一般精通洗碗

:outu::outu::outu::outu::outu::outu::outu::outu::outu::outu:

ssfjhh 发表于 2014-01-17 09:42

楼主感情这是拿坛友当免费劳动力使呀。:outu:

huang6894 发表于 2014-01-17 10:08

回复 3# ssfjhh

对不起

play9091 发表于 2014-01-17 17:41

貼一個這麼長的東西出來，太傷神了……

1. 沒說明這東西是做什麼用的
2. 你已經找出一些間鍵的話大家也好幫忙

直接把全部貼出來真的是幫不上忙啊！

huang6894 发表于 2014-01-17 19:03

回复 5# play9091

其实我也不知道可以做什么的，据说是在一个大文件中匹配对应的位置得到坐标。对不起各位捧场的亲了，sorry

substr函数 发表于 2014-01-17 19:32

人一般精通perl ,据说会python
人一般精通python ,据说也不知道会perl

huang6894 发表于 2014-01-17 19:57

回复 7# substr函数

噢？呵呵，是吗

jun413947139 发表于 2014-01-18 00:18

好丑陋的代码..............顿时无趣，不想写了

huang6894 发表于 2014-01-18 00:30

回复 9# jun413947139

大神，我作为一个菜鸟，不应受到这样的打击的。。。

页: [1] 2

Chinaunix's Archiver

据说会python的人一般精通perl