python for 循环问题求助

yusmile0618 发表于 2014-06-09 12:45

我这里一共有3个文件
1是样本编号
2是每个位点信息
3是样本的位点突变信息

不是每个样本都有这个位点突变信息，写了个for循环，有该位点突变的，从3中提取突变信息，没有的从2中写入原来未突变位点信息，要求每个1中样本，要有2中每个位点的信息，即最后需要得到1中的行数x2中的列数的文件，但是我最后for循环只有一行。

文件图和代码如下：import xlwt

fname=open("1.txt","r")
flocation=open("2.txt","r")
fsnp=open("3","r")

book=xlwt.Workbook(encoding="utf-8",style_compression=0)
sheet=book.add_sheet("snp",cell_overwrite_ok=True)

n=0
for names in fname.readlines():
row=0
name=names.strip("\n")
row=row+1
n=n+1
for locations in flocation.readlines():
   col=5
   location=locations.split("\t")
   col=col+1
   snp1=location
   position=str(location)
   ref=location+" "+location
   for snps in fsnp:
         snps=snps.split("\t")
         sample_name=snps
         sample_snp=snps
         sample_position=str(snps)
         genotype=snps
         allele=genotype+" "+genotype
         if name==sample_name and position in sample_position:
            sheet.write(row,col,allele)
         else:
            #sheet.write(row,col,ref)

book.save("result.xls")

print "done"

icymirror 发表于 2014-06-09 16:23

回复 1# yusmile0618
能给个期望的结果的例子说明嘛？
怀疑是你的row/col的赋值放错地方了，另外，同时你好像没有在最后的if/else处做对应的rol/col的增加或者减少操作。

yusmile0618 发表于 2014-06-09 16:36

你好，谢谢回答
我给一个期望得到结果的图吧

回复 2# icymirror

q1208c 发表于 2014-06-09 16:40

太长, 没看懂, 放弃. :-L

icymirror 发表于 2014-06-09 17:39

回复 3# yusmile0618
大体理解了，（还不是非常确定）
至少有这几个地方需要修改：
1. 12行的row=0需要放到主循环外，比如和11行交换位置
2. 17行的col=5也是同样问题，需要和16行交换位置
3. 另外：在循环体16、23结束之后，需要加入：flocation.seek(0), fsnp.seek(0)，来确定对于文件2、3中的数据每次都是从头开始遍历的，而不是一遍到底就只看文件结束符号了。
（注意缩进，要和循环体开头的for是一样的缩进，而不是和循环体内容一样的缩进。）
现在只是简单找出来这几个，先改下试试看。

yusmile0618 发表于 2014-06-09 21:28

你好，已将row和col放到主循环外
然后flocation.seek(0)因为下面接了另一个for循环，将它与for ... in flocation 对齐会报错。
将fsnp.seek(0)添加并缩进对齐，程序就进入死循环了

如果不加两个seek语句，还是只有一行结果。

请你再帮我分析分析

回复 5# icymirror

icymirror 发表于 2014-06-11 14:11

回复 6# yusmile0618
不确定我们说的对齐是不是一致，把这边修改的给贴一下：import xlwt

fname=open("1.txt","r")
flocation=open("2.txt","r")
fsnp=open("3.txt","r")

book=xlwt.Workbook(encoding="utf-8",style_compression=0)
sheet=book.add_sheet("snp",cell_overwrite_ok=True)

n=0
row=0
for names in fname.readlines():
name=names.strip("\n")
row=row+1
n=n+1
col=5
for locations in flocation.readlines():
   location=locations.split("\t")
   col=col+1
   snp1=location
   position=str(location)
   ref=location+" "+location
   for snps in fsnp:
         snps=snps.split("\t")
         sample_name=snps
         sample_snp=snps
         sample_position=str(snps)
         genotype=snps
         allele=genotype+" "+genotype
         if name==sample_name and position in sample_position:
            print "%d, %d, %s"%(row, col, allele)
            sheet.write(row,col,allele)
         else:
            pass
            #sheet.write(row,col,ref)
   fsnp.seek(0)
flocation.seek(0)

book.save("result.xls")

yusmile0618 发表于 2014-06-11 17:15

你好，非常感谢你给我修改的代码，
这样可以打印出所以在3里面的那些内容，但是不能打印出不在3里面的，就是为什么这里的else，当没有在3当中时，不可以打印2中第三列呢？
回复 7# icymirror

Hadron74 发表于 2014-06-11 20:00

本帖最后由 Hadron74 于 2014-06-11 20:02 编辑

程序不是这么写的。

建议：
你用Excel存得下，数据量很小。建议把2,3文件存到内存中，用字典的方式，采用输入，输出分开的方式，更简洁，且容易调试。
这是我的代码，由于没有数据，没有调试。希望有帮助。import xlwt

all_snps = {}
for snp in open("3.txt"):
snps = snp.rstrip().split('\t')
sample_name = snps
sample_position = snps
allele = " ".join(snps)
if sample_name not in all_snps:
   all_snps = []
all_snps.append((sample_position,allele)) # store alleles in a dictionary

locations = {}
locationsKeys = []
for location in open("2.txt"):
loc = location.rstrip().split('\t')
position = loc
ref=" ".join(loc*2)
locations = ref                      # store default alleles in locations
locationsKeys.append(position)                # record the locations in order

book=xlwt.Workbook(encoding="utf-8",style_compression=0)
sheet=book.add_sheet("snp",cell_overwrite_ok=True)

row = 0
for sample in open("3.txt"):
name = sample.strip()
row += 1
alleles = dict(locations)# set a new dict with reference alleles for initial

# here you can add some filter for number of snp by eg. if len(all_snps) < 2 : continue

for sample_postion,allele in all_snps:
   alleles = allele # update new alleles
# output to Excel

col = 5
for loc in locationsKeys:       # write alleles according to the order
   col += 1
   sheet.write(row,col,alleles) # get allele from alleles

book.save("result.xls")

yusmile0618 发表于 2014-06-11 20:32

:em02:
非常感谢您的指导，程序可以成功运行。
因为刚开始学python，还不太会用字典，所以还应该再努力好好学习下。
再次感谢！
回复 9# Hadron74

页: [1] 2

Chinaunix's Archiver

python for 循环问题求助