免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
12下一页
最近访问板块 发新帖
查看: 3108 | 回复: 12
打印 上一主题 下一主题

perl 生物信息的问题,请教高手 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2011-03-24 10:15 |只看该作者 |倒序浏览
本帖最后由 aids260 于 2011-03-24 10:18 编辑

问题是这样的我有如下的两个文件
  1. +        BGIBMGA000062-TA         FWDP30_FL5_P09.seq        nscaf1071        1318628        1239395        1240559
  2. +        BGIBMGA000062-TA         MFBP02_F_H18.seq        nscaf1071        1318628        1239668        1243526
  3. +        BGIBMGA000064-TA         fdpeP10_F_J11.seq        nscaf1071        1318628        1313192        1316067
  4. +        BGIBMGA000064-TA         MFBP04_F_N06.seq        nscaf1071        1318628        1313197        1316106
  5. +        BGIBMGA000128-TA         FWDP26_FL5_O06.seq        nscaf1108        3459965        1275912        1278501
  6. +        BGIBMGA000129-TA         FWDP02_FL5_F17.seq        nscaf1108        3459965        1283141        1319375
  7. +        BGIBMGA000129-TA         FWDP02_FL5_O19.seq        nscaf1108        3459965        1282859        1319327
  8. +        BGIBMGA000129-TA         FWDP10_FL5_A23.seq        nscaf1108        3459965        1283141        1319395
  9. +        BGIBMGA000129-TA         FWDP31_FL5_I15.seq        nscaf1108        3459965        1283115        1319315
  10. +        BGIBMGA000129-TA         MFBP15_F_I08.seq        nscaf1108        3459965        1283141        1319403
  11. +        BGIBMGA000140-TA         FWDP04_FL5_M21.seq        nscaf1108        3459965        1773552        1776967
  12. +        BGIBMGA000140-TA         FWDP25_FL5_J23.seq        nscaf1108        3459965        1785487        1791348
  13. +        BGIBMGA000154-TA         FWDP03_FL5_B14.seq        nscaf1108        3459965        2355076        2357513
  14. +        BGIBMGA000154-TA         fdpeP09_F_E13.seq        nscaf1108        3459965        2355077        2357478
复制代码
  1. >BGIBMGA000061-TA  cds:novel  sequence:nscaf1071:1233907:1234573:+  gene:BGIBMGA000061  protein:BGIBMGA000061-PA
  2. >BGIBMGA000062-TA  cds:novel  sequence:nscaf1071:1239819:1245122:+  gene:BGIBMGA000062  protein:BGIBMGA000062-PA
  3. >BGIBMGA000063-TA  cds:novel  sequence:nscaf1071:1293757:1306270:+  gene:BGIBMGA000063  protein:BGIBMGA000063-PA
  4. >BGIBMGA000064-TA  cds:novel  sequence:nscaf1071:1313232:1316001:+  gene:BGIBMGA000064  protein:BGIBMGA000064-PA
  5. >BGIBMGA000065-TA  cds:novel  sequence:nscaf1087:2451:9117:+  gene:BGIBMGA000065  protein:BGIBMGA000065-PA
  6. >BGIBMGA000066-TA  cds:novel  sequence:nscaf109:1549:4332:+  gene:BGIBMGA000066  protein:BGIBMGA000066-PA
  7. >BGIBMGA000067-TA  cds:novel  sequence:nscaf1108:3401547:3415278:-  gene:BGIBMGA000067  protein:BGIBMGA000067-PA
  8. >BGIBMGA000068-TA  cds:novel  sequence:nscaf1108:2724200:2728350:-  gene:BGIBMGA000068  protein:BGIBMGA000068-PA
  9. >BGIBMGA000069-TA  cds:novel  sequence:nscaf1108:2546311:2574462:-  gene:BGIBMGA000069  protein:BGIBMGA000069-PA
  10. >BGIBMGA000070-TA  cds:novel  sequence:nscaf1108:2533757:2534560:-  gene:BGIBMGA000070  protein:BGIBMGA000070-PA
  11. >BGIBMGA000071-TA  cds:novel  sequence:nscaf1108:2527802:2528488:-  gene:BGIBMGA000071  protein:BGIBMGA000071-PA
  12. >BGIBMGA000072-TA  cds:novel  sequence:nscaf1108:2442934:2452908:-  gene:BGIBMGA000072  protein:BGIBMGA000072-PA
  13. >BGIBMGA000073-TA  cds:novel  sequence:nscaf1108:2382927:2390096:-  gene:BGIBMGA000073  protein:BGIBMGA000073-PA
  14. >BGIBMGA000074-TA  cds:novel  sequence:nscaf1108:2370811:2374956:-  gene:BGIBMGA000074  protein:BGIBMGA000074-PA
  15. >BGIBMGA000075-TA  cds:novel  sequence:nscaf1108:2350982:2353168:-  gene:BGIBMGA000075  protein:BGIBMGA000075-PA
  16. >BGIBMGA000076-TA  cds:novel  sequence:nscaf1108:2305769:2312842:-  gene:BGIBMGA000076  protein:BGIBMGA000076-PA
  17. >BGIBMGA000077-TA  cds:novel  sequence:nscaf1108:2238841:2239515:-  gene:BGIBMGA000077  protein:BGIBMGA000077-PA
  18. >BGIBMGA000078-TA  cds:novel  sequence:nscaf1108:2207199:2212812:-  gene:BGIBMGA000078  protein:BGIBMGA000078-PA
  19. >BGIBMGA000079-TA  cds:novel  sequence:nscaf1108:2140463:2140972:-  gene:BGIBMGA000079  protein:BGIBMGA000079-PA
  20. >BGIBMGA000080-TA  cds:novel  sequence:nscaf1108:2116762:2129703:-  gene:BGIBMGA000080  protein:BGIBMGA000080-PA
  21. >BGIBMGA000081-TA  cds:novel  sequence:nscaf1108:2004448:2004807:-  gene:BGIBMGA000081  protein:BGIBMGA000081-PA
  22. >BGIBMGA000082-TA  cds:novel  sequence:nscaf1108:1763047:1773049:-  gene:BGIBMGA000082  protein:BGIBMGA000082-PA
  23. >BGIBMGA000083-TA  cds:novel  sequence:nscaf1108:1689741:1695693:-  gene:BGIBMGA000083  protein:BGIBMGA000083-PA
  24. >BGIBMGA000084-TA  cds:novel  sequence:nscaf1108:1444455:1452371:-  gene:BGIBMGA000084  protein:BGIBMGA000084-PA
  25. >BGIBMGA000085-TA  cds:novel  sequence:nscaf1108:1417113:1421224:-  gene:BGIBMGA000085  protein:BGIBMGA000085-PA
  26. >BGIBMGA000086-TA  cds:novel  sequence:nscaf1108:1380545:1384875:-  gene:BGIBMGA000086  protein:BGIBMGA000086-PA
  27. >BGIBMGA000087-TA  cds:novel  sequence:nscaf1108:1377062:1378515:-  gene:BGIBMGA000087  protein:BGIBMGA000087-PA
  28. >BGIBMGA000088-TA  cds:novel  sequence:nscaf1108:1348800:1362931:-  gene:BGIBMGA000088  protein:BGIBMGA000088-PA
  29. >BGIBMGA000089-TA  cds:novel  sequence:nscaf1108:1325168:1333036:-  gene:BGIBMGA000089  protein:BGIBMGA000089-PA
  30. >BGIBMGA000090-TA  cds:novel  sequence:nscaf1108:1242058:1246670:-  gene:BGIBMGA000090  protein:BGIBMGA000090-PA
  31. >BGIBMGA000091-TA  cds:novel  sequence:nscaf1108:1195516:1203324:-  gene:BGIBMGA000091  protein:BGIBMGA000091-PA
  32. >BGIBMGA000092-TA  cds:novel  sequence:nscaf1108:1188532:1189815:-  gene:BGIBMGA000092  protein:BGIBMGA000092-PA
  33. >BGIBMGA000093-TA  cds:novel  sequence:nscaf1108:853496:856414:-  gene:BGIBMGA000093  protein:BGIBMGA000093-PA
  34. >BGIBMGA000094-TA  cds:novel  sequence:nscaf1108:812670:813738:-  gene:BGIBMGA000094  protein:BGIBMGA000094-PA
  35. >BGIBMGA000095-TA  cds:novel  sequence:nscaf1108:761460:762528:-  gene:BGIBMGA000095  protein:BGIBMGA000095-PA
  36. >BGIBMGA000096-TA  cds:novel  sequence:nscaf1108:746557:746766:-  gene:BGIBMGA000096  protein:BGIBMGA000096-PA
  37. >BGIBMGA000097-TA  cds:novel  sequence:nscaf1108:699721:728407:-  gene:BGIBMGA000097  protein:BGIBMGA000097-PA
  38. >BGIBMGA000098-TA  cds:novel  sequence:nscaf1108:661420:670628:-  gene:BGIBMGA000098  protein:BGIBMGA000098-PA
  39. >BGIBMGA000099-TA  cds:novel  sequence:nscaf1108:651940:653728:-  gene:BGIBMGA000099  protein:BGIBMGA000099-PA
  40. >BGIBMGA000100-TA  cds:novel  sequence:nscaf1108:609611:612290:-  gene:BGIBMGA000100  protein:BGIBMGA000100-PA
  41. >BGIBMGA000101-TA  cds:novel  sequence:nscaf1108:585872:608290:-  gene:BGIBMGA000101  protein:BGIBMGA000101-PA
  42. >BGIBMGA000102-TA  cds:novel  sequence:nscaf1108:572734:581632:-  gene:BGIBMGA000102  protein:BGIBMGA000102-PA
  43. >BGIBMGA000103-TA  cds:novel  sequence:nscaf1108:539556:564561:-  gene:BGIBMGA000103  protein:BGIBMGA000103-PA
  44. >BGIBMGA000104-TA  cds:novel  sequence:nscaf1108:481032:500130:-  gene:BGIBMGA000104  protein:BGIBMGA000104-PA
  45. >BGIBMGA000105-TA  cds:novel  sequence:nscaf1108:474705:475046:-  gene:BGIBMGA000105  protein:BGIBMGA000105-PA
  46. >BGIBMGA000106-TA  cds:novel  sequence:nscaf1108:468795:469893:-  gene:BGIBMGA000106  protein:BGIBMGA000106-PA
  47. >BGIBMGA000107-TA  cds:novel  sequence:nscaf1108:448383:452888:-  gene:BGIBMGA000107  protein:BGIBMGA000107-PA
  48. >BGIBMGA000108-TA  cds:novel  sequence:nscaf1108:341698:343713:-  gene:BGIBMGA000108  protein:BGIBMGA000108-PA
  49. >BGIBMGA000109-TA  cds:novel  sequence:nscaf1108:292248:302760:-  gene:BGIBMGA000109  protein:BGIBMGA000109-PA
  50. >BGIBMGA000110-TA  cds:novel  sequence:nscaf1108:264694:281503:-  gene:BGIBMGA000110  protein:BGIBMGA000110-PA
  51. >BGIBMGA000111-TA  cds:novel  sequence:nscaf1108:259701:262677:-  gene:BGIBMGA000111  protein:BGIBMGA000111-PA
  52. >BGIBMGA000112-TA  cds:novel  sequence:nscaf1108:256846:258559:-  gene:BGIBMGA000112  protein:BGIBMGA000112-PA
  53. >BGIBMGA000113-TA  cds:novel  sequence:nscaf1108:246671:247303:-  gene:BGIBMGA000113  protein:BGIBMGA000113-PA
  54. >BGIBMGA000114-TA  cds:novel  sequence:nscaf1108:147245:154538:-  gene:BGIBMGA000114  protein:BGIBMGA000114-PA
  55. >BGIBMGA000115-TA  cds:novel  sequence:nscaf1108:130608:137309:+  gene:BGIBMGA000115  protein:BGIBMGA000115-PA
  56. >BGIBMGA000116-TA  cds:novel  sequence:nscaf1108:227655:230005:+  gene:BGIBMGA000116  protein:BGIBMGA000116-PA
  57. >BGIBMGA000117-TA  cds:novel  sequence:nscaf1108:238521:243968:+  gene:BGIBMGA000117  protein:BGIBMGA000117-PA
  58. >BGIBMGA000118-TA  cds:novel  sequence:nscaf1108:427906:434216:+  gene:BGIBMGA000118  protein:BGIBMGA000118-PA
  59. >BGIBMGA000119-TA  cds:novel  sequence:nscaf1108:438761:444662:+  gene:BGIBMGA000119  protein:BGIBMGA000119-PA
  60. >BGIBMGA000120-TA  cds:novel  sequence:nscaf1108:454682:455977:+  gene:BGIBMGA000120  protein:BGIBMGA000120-PA
  61. >BGIBMGA000121-TA  cds:novel  sequence:nscaf1108:466219:467505:+  gene:BGIBMGA000121  protein:BGIBMGA000121-PA
  62. >BGIBMGA000122-TA  cds:novel  sequence:nscaf1108:472642:474571:+  gene:BGIBMGA000122  protein:BGIBMGA000122-PA
  63. >BGIBMGA000123-TA  cds:novel  sequence:nscaf1108:518456:524771:+  gene:BGIBMGA000123  protein:BGIBMGA000123-PA
  64. >BGIBMGA000124-TA  cds:novel  sequence:nscaf1108:566241:567003:+  gene:BGIBMGA000124  protein:BGIBMGA000124-PA
  65. >BGIBMGA000125-TA  cds:novel  sequence:nscaf1108:675442:676344:+  gene:BGIBMGA000125  protein:BGIBMGA000125-PA
  66. >BGIBMGA000126-TA  cds:novel  sequence:nscaf1108:689561:693762:+  gene:BGIBMGA000126  protein:BGIBMGA000126-PA
  67. >BGIBMGA000127-TA  cds:novel  sequence:nscaf1108:776396:779523:+  gene:BGIBMGA000127  protein:BGIBMGA000127-PA
  68. >BGIBMGA000128-TA  cds:novel  sequence:nscaf1108:1267202:1280573:+  gene:BGIBMGA000128  protein:BGIBMGA000128-PA
  69. >BGIBMGA000129-TA  cds:novel  sequence:nscaf1108:1301497:1305441:+  gene:BGIBMGA000129  protein:BGIBMGA000129-PA
  70. >BGIBMGA000130-TA  cds:novel  sequence:nscaf1108:1333949:1340942:+  gene:BGIBMGA000130  protein:BGIBMGA000130-PA
  71. >BGIBMGA000131-TA  cds:novel  sequence:nscaf1108:1367146:1371308:+  gene:BGIBMGA000131  protein:BGIBMGA000131-PA
  72. >BGIBMGA000132-TA  cds:novel  sequence:nscaf1108:1391873:1412495:+  gene:BGIBMGA000132  protein:BGIBMGA000132-PA
  73. >BGIBMGA000133-TA  cds:novel  sequence:nscaf1108:1433224:1440508:+  gene:BGIBMGA000133  protein:BGIBMGA000133-PA
  74. >BGIBMGA000134-TA  cds:novel  sequence:nscaf1108:1548816:1549378:+  gene:BGIBMGA000134  protein:BGIBMGA000134-PA
  75. >BGIBMGA000135-TA  cds:novel  sequence:nscaf1108:1599404:1602012:+  gene:BGIBMGA000135  protein:BGIBMGA000135-PA
  76. >BGIBMGA000136-TA  cds:novel  sequence:nscaf1108:1638627:1665363:+  gene:BGIBMGA000136  protein:BGIBMGA000136-PA
  77. >BGIBMGA000137-TA  cds:novel  sequence:nscaf1108:1677792:1680353:+  gene:BGIBMGA000137  protein:BGIBMGA000137-PA
  78. >BGIBMGA000138-TA  cds:novel  sequence:nscaf1108:1736212:1737327:+  gene:BGIBMGA000138  protein:BGIBMGA000138-PA
  79. >BGIBMGA000139-TA  cds:novel  sequence:nscaf1108:1751394:1753674:+  gene:BGIBMGA000139  protein:BGIBMGA000139-PA
  80. >BGIBMGA000140-TA  cds:novel  sequence:nscaf1108:1773646:1800068:+  gene:BGIBMGA000140  protein:BGIBMGA000140-PA
  81. >BGIBMGA000141-TA  cds:novel  sequence:nscaf1108:1811651:1812675:+  gene:BGIBMGA000141  protein:BGIBMGA000141-PA
  82. >BGIBMGA000142-TA  cds:novel  sequence:nscaf1108:1867311:1869313:+  gene:BGIBMGA000142  protein:BGIBMGA000142-PA
  83. >BGIBMGA000143-TA  cds:novel  sequence:nscaf1108:1881514:1894351:+  gene:BGIBMGA000143  protein:BGIBMGA000143-PA
  84. >BGIBMGA000144-TA  cds:novel  sequence:nscaf1108:1966744:1993301:+  gene:BGIBMGA000144  protein:BGIBMGA000144-PA
  85. >BGIBMGA000145-TA  cds:novel  sequence:nscaf1108:2040221:2040984:+  gene:BGIBMGA000145  protein:BGIBMGA000145-PA
  86. >BGIBMGA000146-TA  cds:novel  sequence:nscaf1108:2067039:2079635:+  gene:BGIBMGA000146  protein:BGIBMGA000146-PA
  87. >BGIBMGA000147-TA  cds:novel  sequence:nscaf1108:2094499:2102185:+  gene:BGIBMGA000147  protein:BGIBMGA000147-PA
  88. >BGIBMGA000148-TA  cds:novel  sequence:nscaf1108:2104167:2104543:+  gene:BGIBMGA000148  protein:BGIBMGA000148-PA
  89. >BGIBMGA000149-TA  cds:novel  sequence:nscaf1108:2180746:2189972:+  gene:BGIBMGA000149  protein:BGIBMGA000149-PA
  90. >BGIBMGA000150-TA  cds:novel  sequence:nscaf1108:2193245:2197595:+  gene:BGIBMGA000150  protein:BGIBMGA000150-PA
  91. >BGIBMGA000151-TA  cds:novel  sequence:nscaf1108:2232343:2232987:+  gene:BGIBMGA000151  protein:BGIBMGA000151-PA
复制代码
第一个文件的第二列和第二个文件的第一列含有类似的信息,第一文件的第六列和第二个文件的第四列含有类似的数字信息(位置信息)

我的问题是这样的:从第一个文件的第二列的值,找出在第二个文件中和它相等的值,然后比较第一个文件的这行的第六列的值和第二个文件中这一行中第四列的数值的大小,如果第一个文件的中的值大的话,就删除这一行,否则就保留这一行,或者输出的一个新的文件中去??
各位高手,赐教啊!!

论坛徽章:
2
射手座
日期:2014-10-10 15:59:4715-16赛季CBA联赛之上海
日期:2016-03-03 10:27:14
2 [报告]
发表于 2011-03-24 10:26 |只看该作者
本帖最后由 yinyuemi 于 2011-03-24 10:28 编辑
  1. awk 'NR==FNR{a[$1]=$4}NR>FNR && a[$2] && (a[$2] >gensub(/.*:([0-9]*):([0-9]*):\+/,"\\1",1,$4))' FS="[>| ]" file2 file1 >output
复制代码

论坛徽章:
0
3 [报告]
发表于 2011-03-24 15:51 |只看该作者
回复 2# yinyuemi


    我试试看可以不!

论坛徽章:
0
4 [报告]
发表于 2011-03-24 16:33 |只看该作者
有没有用perl 的方法啊!我虚拟机用不起了啊

论坛徽章:
0
5 [报告]
发表于 2011-03-24 21:46 |只看该作者
回复 2# yinyuemi


    我试了一下好像不行啊,哥们

论坛徽章:
0
6 [报告]
发表于 2011-03-24 21:54 |只看该作者
请高手支招啊,急需程序

论坛徽章:
0
7 [报告]
发表于 2011-03-24 22:18 |只看该作者
请高手围观啊!

论坛徽章:
0
8 [报告]
发表于 2011-03-24 22:26 |只看该作者
跪求高手

论坛徽章:
0
9 [报告]
发表于 2011-03-25 03:43 |只看该作者
excel最简单

论坛徽章:
0
10 [报告]
发表于 2011-03-25 06:33 |只看该作者
回复 9# greencow


    excel 怎么弄吗?大侠!
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP