- 论坛徽章:
- 0
|
本帖最后由 小风0000 于 2016-08-16 05:07 编辑
- dat='''
- SNPID A702Y A704Y A706Y A708Y A710Y
- ARS-BFGL-BAC-10172 CC CC CC CC CC
- ARS-BFGL-BAC-1020 CC CC CT CC CC
- '''
- names=["SNPID","A702Y","A710Y"]
复制代码 由于数据比较大,有4万行,7000列,要提出800列的数据,大家有什么好的办法吗?
- script,originalFN,targetFN = sys.argv
- originalInds = open(originalFN).readline().strip().split()
- targetInds = [line.strip() for line in open(targetFN)]
- targetF=open("targetInds.txt","w")
- #find index
- idx = [ originalInds.index(ind) for ind in targetInds if ind in originalInds ]
- idx.insert(0,0)
- #output
- for num,line in enumerate(open(originalFN)):
- print num
- tmp = [line.strip().split()[i] for i in idx]
- targetF.write(" ".join(tmp)+"\n")
- targetF.close()
复制代码 这是我写的代码,先取出列名的下标,再在大文件一行行弄出来,有点慢,求助!
|
|