忘记密码   免费注册 查看新帖 | 论坛精华区

ChinaUnix.net

  平台 论坛 博客 认证专区 大话IT 视频 徽章 文库 沙龙 自测 下载 频道自动化运维 虚拟化 储存备份 C/C++ PHP MySQL 嵌入式 Linux系统
123下一页
最近访问板块 发新帖
查看: 1569 | 回复: 24

【生物信息】一个perl脚本的求助 [复制链接]

论坛徽章:
0
发表于 2017-02-15 21:40 |显示全部楼层
整理出来的BLAST结果,几百条序列与一个基因组进行比对,得到了10000多条结果。我想要写一个处理这个数据的小脚本。脚本的功能是:每一个序列所比对出来的数十条结果中,选出得分最高的。即多少个序列便应该得到几个最优的结果。
以下是我写的脚本:(但是实现不了这个功能T.T,求助大神!!)
#!/usr/bin/perl -w


my $FIN=shift;
open(IN,"< $FIN")||die;
open(FOUT,"> $0.out.txt")||die;


my%best;
while(<IN>){
chomp;
my@a=split(/\t/);


$queryid=$a[0];
$score=$a[11];


$best{$queryid}{$score}=$_;
foreach my $queryid(sort keys%best){
foreach my $score(sort{$b<=>$a} keys%{$best{$queryid}}){
print FOUT"{$queryid}{$score}=>$_\n";
last;
}
}
}
close IN;
close FOUT;

论坛徽章:
0
发表于 2017-02-16 13:09 |显示全部楼层
555没有大神能帮个忙么

论坛徽章:
0
发表于 2017-02-20 14:23 |显示全部楼层
楼主,有相关数据吗,单从这样看很吃力的~

论坛徽章:
0
发表于 2017-02-22 22:52 |显示全部楼层
回复 3# 华小飞_Perl您好,数据如图。
A栏为序列的名字,每个序列都有多个比对的结果。我的目的是想输出每个序列的多个结果中分值(L栏)最高的那一行结果。
QQ图片20170222224623.png

论坛徽章:
302
程序设计版块每周发帖之星
日期:2016-04-08 00:41:33操作系统版块每日发帖之星
日期:2015-09-02 06:20:00每日论坛发贴之星
日期:2015-09-02 06:20:00程序设计版块每日发帖之星
日期:2015-09-04 06:20:00每日论坛发贴之星
日期:2015-09-04 06:20:00每周论坛发贴之星
日期:2015-09-06 22:22:00程序设计版块每日发帖之星
日期:2015-09-09 06:20:00程序设计版块每日发帖之星
日期:2015-09-19 06:20:00程序设计版块每日发帖之星
日期:2015-09-20 06:20:00每日论坛发贴之星
日期:2015-09-20 06:20:00程序设计版块每日发帖之星
日期:2015-09-22 06:20:00程序设计版块每日发帖之星
日期:2015-09-24 06:20:00
发表于 2017-02-23 08:42 |显示全部楼层
  1. #!/usr/bin/perl
  2. use strict;
  3. use warnings;

  4. my %hData = ();
  5. while (<DATA>){
  6.         chomp (my @aT = split (/,/));
  7.         my ($id, $val) = @aT[0, -1];
  8.         push (@{$hData{$id}}, [$val, $_])
  9. }

  10. foreach (keys %hData){
  11.         local ($_) = sort {$b->[0] <=> $a->[0]} @{$hData{$_}};
  12.         print @{$_}[-1];
  13. }

  14. __DATA__
  15. ENSXMAP00000010943,anllUMD3.1IGK000025.2,33.33,102,53,3,26,127,211170,211430,3.00E-07,54.7
  16. ENSXMAP00000010943,anllUMD3.1IGK000025.2,33.33,102,53,3,26,127,214224,214484,3.00E-07,54.7
  17. ENSXMAP00000010943,anllUMD3.1IGK000025.2,38.27,81,44,2,28,108,207456,207680,2.00E-06,52
  18. ENSXMAP00000010943,anllUMD3.1IGK000025.2,38.46,78,42,2,31,108,193419,193634,3.00E-06,52
  19. ENSXMAP00000010943,anllUMD3.1IGK000011.2,38.16,76,46,1,72,147,650507,650283,7.00E-07,53.5
  20. ENSXMAP00000017412,anllUMD3.1IGK000025.2,47.22,72,38,0,31,102,214239,214454,2.00E-24,77.8
  21. ENSXMAP00000017412,anllUMD3.1IGK000025.2,5814,43,18,0,99,141,214547,214675,2.00E-24,56.2
  22. ENSXMAP00000017412,anllUMD3.1IGK000025.2,47.22,72,38,0,31,102,211185,211400,6.00E-24,75.9
  23. ENSXMAP00000017412,anllUMD3.1IGK000025.2,5814,43,18,0,99,141,211493,211621,6.00E-24,56.2
  24. ENSXMAP00000017412,anllUMD3.1IGK000025.2,43.84,73,40,1,31,103,209012,209227,6.00E-18,68.6
  25. ENSXMAP00000017412,anllUMD3.1IGK000025.2,5116,43,21,0,99,141,209328,209456,6.00E-18,431
  26. ENSXMAP00000017412,anllUMD3.1IGK000025.2,48,75,39,0,26,100,207450,207674,3.00E-16,824
  27. ENSXMAP00000017412,anllUMD3.1IGK000025.2,48.57,70,36,0,31,100,193419,193628,4.00E-16,82
  28. ENSXMAP00000017412,anllUMD3.1IGK000025.2,42.86,70,40,0,33,102,217170,217379,8.00E-13,68.9
  29. ENSXMAP00000017412,anllUMD3.1IGK000025.2,40.62,32,19,0,106,137,217484,217579,8.00E-13,254
  30. ENSXMAP00000017412,anllUMD3.1IGK000025.2,5814,43,18,0,99,141,193823,193951,5.00E-07,54.3
  31. ENSXMAP00000017412,anllUMD3.1IGK000025.2,59.52,42,17,0,99,140,207870,207995,1.00E-06,52.8
  32. ENSXMAP00000017412,anllUMD3.1IGK000011.2,49.28,69,35,0,73,141,650486,650280,4.00E-11,66.6
  33. ENSXMAP00000017436,anllUMD3.1IGK000015.2,60.53,76,30,0,31,106,49007223,49007450,1.00E-24,107
  34. ENSXMAP00000017436,anllUMD3.1IGK000015.2,52.13,94,41,2,16,106,48997867,48998145,6.00E-24,105
  35. ENSXMAP00000017436,anllUMD3.1IGK000015.2,57.89,76,32,0,31,106,49023242,49023469,1.00E-23,103
  36. ENSXMAP00000017436,anllUMD3.1IGK000015.2,56.58,76,33,0,31,106,49341123,49340896,7.00E-23,101
  37. ENSXMAP00000017436,anllUMD3.1IGK000015.2,56.58,76,33,0,31,106,49340178,49339951,7.00E-23,101
  38. ENSXMAP00000017436,anllUMD3.1IGK000015.2,56.58,76,33,0,31,106,49073678,49073905,7.00E-23,101
  39. ENSXMAP00000017436,anllUMD3.1IGK000015.2,57.53,73,31,0,34,106,49049276,49049494,6.00E-22,99
复制代码

论坛徽章:
26
15-16赛季CBA联赛之八一
日期:2016-02-22 19:10:4215-16赛季CBA联赛之青岛
日期:2016-11-26 17:00:4615-16赛季CBA联赛之深圳
日期:2016-12-01 10:34:0415-16赛季CBA联赛之新疆
日期:2016-12-07 10:24:2915-16赛季CBA联赛之同曦
日期:2016-12-15 12:06:43CU十四周年纪念徽章
日期:2016-12-18 13:03:4415-16赛季CBA联赛之吉林
日期:2017-01-03 15:52:2515-16赛季CBA联赛之辽宁
日期:2017-01-04 14:58:2415-16赛季CBA联赛之辽宁
日期:2017-01-15 09:42:512016科比退役纪念章
日期:2017-02-06 17:21:50黑曼巴
日期:2017-02-10 15:46:1215-16赛季CBA联赛之上海
日期:2017-03-18 10:14:54
发表于 2017-02-23 11:03 |显示全部楼层

如 第一个序列 第L栏 有两个最大值(54.7), 这两条对应的记录都要取吗?

论坛徽章:
302
程序设计版块每周发帖之星
日期:2016-04-08 00:41:33操作系统版块每日发帖之星
日期:2015-09-02 06:20:00每日论坛发贴之星
日期:2015-09-02 06:20:00程序设计版块每日发帖之星
日期:2015-09-04 06:20:00每日论坛发贴之星
日期:2015-09-04 06:20:00每周论坛发贴之星
日期:2015-09-06 22:22:00程序设计版块每日发帖之星
日期:2015-09-09 06:20:00程序设计版块每日发帖之星
日期:2015-09-19 06:20:00程序设计版块每日发帖之星
日期:2015-09-20 06:20:00每日论坛发贴之星
日期:2015-09-20 06:20:00程序设计版块每日发帖之星
日期:2015-09-22 06:20:00程序设计版块每日发帖之星
日期:2015-09-24 06:20:00
发表于 2017-02-23 11:17 |显示全部楼层
  1. #!/usr/bin/perl
  2. use strict;
  3. use warnings;

  4. my %hData = ();
  5. while (<DATA>){
  6.         chomp (my @aT = split (/,/));
  7.         my ($id, $val) = @aT[0, -1];
  8.         push (@{$hData{$id}}, [$val, $_])
  9. }

  10. foreach (keys %hData){
  11.         my @aT = sort {$b->[0] <=> $a->[0]} @{$hData{$_}};
  12.         print  map(@{$_}[-1], grep (@{$aT[0]}[0] == @{$_}[0], @aT));
  13. }

  14. __DATA__
  15. ENSXMAP00000010943,anllUMD3.1IGK000025.2,33.33,102,53,3,26,127,211170,211430,3.00E-07,54.7
  16. ENSXMAP00000010943,anllUMD3.1IGK000025.2,33.33,102,53,3,26,127,214224,214484,3.00E-07,54.7
  17. ENSXMAP00000010943,anllUMD3.1IGK000025.2,38.27,81,44,2,28,108,207456,207680,2.00E-06,52
  18. ENSXMAP00000010943,anllUMD3.1IGK000025.2,38.46,78,42,2,31,108,193419,193634,3.00E-06,52
  19. ENSXMAP00000010943,anllUMD3.1IGK000011.2,38.16,76,46,1,72,147,650507,650283,7.00E-07,53.5
  20. ENSXMAP00000017412,anllUMD3.1IGK000025.2,47.22,72,38,0,31,102,214239,214454,2.00E-24,77.8
  21. ENSXMAP00000017412,anllUMD3.1IGK000025.2,5814,43,18,0,99,141,214547,214675,2.00E-24,56.2
  22. ENSXMAP00000017412,anllUMD3.1IGK000025.2,47.22,72,38,0,31,102,211185,211400,6.00E-24,75.9
  23. ENSXMAP00000017412,anllUMD3.1IGK000025.2,5814,43,18,0,99,141,211493,211621,6.00E-24,56.2
  24. ENSXMAP00000017412,anllUMD3.1IGK000025.2,43.84,73,40,1,31,103,209012,209227,6.00E-18,68.6
  25. ENSXMAP00000017412,anllUMD3.1IGK000025.2,5116,43,21,0,99,141,209328,209456,6.00E-18,431
  26. ENSXMAP00000017412,anllUMD3.1IGK000025.2,48,75,39,0,26,100,207450,207674,3.00E-16,824
  27. ENSXMAP00000017412,anllUMD3.1IGK000025.2,48.57,70,36,0,31,100,193419,193628,4.00E-16,82
  28. ENSXMAP00000017412,anllUMD3.1IGK000025.2,42.86,70,40,0,33,102,217170,217379,8.00E-13,68.9
  29. ENSXMAP00000017412,anllUMD3.1IGK000025.2,40.62,32,19,0,106,137,217484,217579,8.00E-13,254
  30. ENSXMAP00000017412,anllUMD3.1IGK000025.2,5814,43,18,0,99,141,193823,193951,5.00E-07,54.3
  31. ENSXMAP00000017412,anllUMD3.1IGK000025.2,59.52,42,17,0,99,140,207870,207995,1.00E-06,52.8
  32. ENSXMAP00000017412,anllUMD3.1IGK000011.2,49.28,69,35,0,73,141,650486,650280,4.00E-11,66.6
  33. ENSXMAP00000017436,anllUMD3.1IGK000015.2,60.53,76,30,0,31,106,49007223,49007450,1.00E-24,107
  34. ENSXMAP00000017436,anllUMD3.1IGK000015.2,52.13,94,41,2,16,106,48997867,48998145,6.00E-24,105
  35. ENSXMAP00000017436,anllUMD3.1IGK000015.2,57.89,76,32,0,31,106,49023242,49023469,1.00E-23,103
  36. ENSXMAP00000017436,anllUMD3.1IGK000015.2,56.58,76,33,0,31,106,49341123,49340896,7.00E-23,101
  37. ENSXMAP00000017436,anllUMD3.1IGK000015.2,56.58,76,33,0,31,106,49340178,49339951,7.00E-23,101
  38. ENSXMAP00000017436,anllUMD3.1IGK000015.2,56.58,76,33,0,31,106,49073678,49073905,7.00E-23,101
  39. ENSXMAP00000017436,anllUMD3.1IGK000015.2,57.53,73,31,0,34,106,49049276,49049494,6.00E-22,99
复制代码

论坛徽章:
26
15-16赛季CBA联赛之八一
日期:2016-02-22 19:10:4215-16赛季CBA联赛之青岛
日期:2016-11-26 17:00:4615-16赛季CBA联赛之深圳
日期:2016-12-01 10:34:0415-16赛季CBA联赛之新疆
日期:2016-12-07 10:24:2915-16赛季CBA联赛之同曦
日期:2016-12-15 12:06:43CU十四周年纪念徽章
日期:2016-12-18 13:03:4415-16赛季CBA联赛之吉林
日期:2017-01-03 15:52:2515-16赛季CBA联赛之辽宁
日期:2017-01-04 14:58:2415-16赛季CBA联赛之辽宁
日期:2017-01-15 09:42:512016科比退役纪念章
日期:2017-02-06 17:21:50黑曼巴
日期:2017-02-10 15:46:1215-16赛季CBA联赛之上海
日期:2017-03-18 10:14:54
发表于 2017-02-23 11:55 |显示全部楼层

  1. awk -F, 'NR==FNR{if($NF>a[$1])a[$1]=$NF;next}$NF==a[$1]' file file
复制代码

论坛徽章:
302
程序设计版块每周发帖之星
日期:2016-04-08 00:41:33操作系统版块每日发帖之星
日期:2015-09-02 06:20:00每日论坛发贴之星
日期:2015-09-02 06:20:00程序设计版块每日发帖之星
日期:2015-09-04 06:20:00每日论坛发贴之星
日期:2015-09-04 06:20:00每周论坛发贴之星
日期:2015-09-06 22:22:00程序设计版块每日发帖之星
日期:2015-09-09 06:20:00程序设计版块每日发帖之星
日期:2015-09-19 06:20:00程序设计版块每日发帖之星
日期:2015-09-20 06:20:00每日论坛发贴之星
日期:2015-09-20 06:20:00程序设计版块每日发帖之星
日期:2015-09-22 06:20:00程序设计版块每日发帖之星
日期:2015-09-24 06:20:00
发表于 2017-02-23 15:28 |显示全部楼层
  1. #!/usr/bin/perl
  2. use strict;
  3. use warnings;

  4. my @aData = ();
  5. while (<DATA>){
  6.         chomp (my @aT = split (/,/));
  7.         my ($id, $val) = @aT[0, -1];
  8.         unless (@aData){
  9.                 @aData = ($id, $val, $_);
  10.                 next;
  11.         }
  12.         if ($aData[0] ne $id){
  13.                 print splice (@aData, 2);
  14.                 @aData = ($id, $val, $_);
  15.                 next;
  16.         }
  17.         if ($aData[1] == $val){
  18.                 push (@aData, $_);
  19.                 next;
  20.         }
  21.         @aData = ($id, $val, $_) if ($aData[1] < $val);
  22. }

  23. print splice (@aData, 2);

  24. __DATA__
  25. ENSXMAP00000010943,anllUMD3.1IGK000025.2,33.33,102,53,3,26,127,211170,211430,3.00E-07,54.7
  26. ENSXMAP00000010943,anllUMD3.1IGK000025.2,33.33,102,53,3,26,127,214224,214484,3.00E-07,54.7
  27. ENSXMAP00000010943,anllUMD3.1IGK000025.2,38.27,81,44,2,28,108,207456,207680,2.00E-06,52
  28. ENSXMAP00000010943,anllUMD3.1IGK000025.2,38.46,78,42,2,31,108,193419,193634,3.00E-06,52
  29. ENSXMAP00000010943,anllUMD3.1IGK000011.2,38.16,76,46,1,72,147,650507,650283,7.00E-07,53.5
  30. ENSXMAP00000017412,anllUMD3.1IGK000025.2,47.22,72,38,0,31,102,214239,214454,2.00E-24,77.8
  31. ENSXMAP00000017412,anllUMD3.1IGK000025.2,5814,43,18,0,99,141,214547,214675,2.00E-24,56.2
  32. ENSXMAP00000017412,anllUMD3.1IGK000025.2,47.22,72,38,0,31,102,211185,211400,6.00E-24,75.9
  33. ENSXMAP00000017412,anllUMD3.1IGK000025.2,5814,43,18,0,99,141,211493,211621,6.00E-24,56.2
  34. ENSXMAP00000017412,anllUMD3.1IGK000025.2,43.84,73,40,1,31,103,209012,209227,6.00E-18,68.6
  35. ENSXMAP00000017412,anllUMD3.1IGK000025.2,5116,43,21,0,99,141,209328,209456,6.00E-18,431
  36. ENSXMAP00000017412,anllUMD3.1IGK000025.2,48,75,39,0,26,100,207450,207674,3.00E-16,824
  37. ENSXMAP00000017412,anllUMD3.1IGK000025.2,48.57,70,36,0,31,100,193419,193628,4.00E-16,82
  38. ENSXMAP00000017412,anllUMD3.1IGK000025.2,42.86,70,40,0,33,102,217170,217379,8.00E-13,68.9
  39. ENSXMAP00000017412,anllUMD3.1IGK000025.2,40.62,32,19,0,106,137,217484,217579,8.00E-13,254
  40. ENSXMAP00000017412,anllUMD3.1IGK000025.2,5814,43,18,0,99,141,193823,193951,5.00E-07,54.3
  41. ENSXMAP00000017412,anllUMD3.1IGK000025.2,59.52,42,17,0,99,140,207870,207995,1.00E-06,52.8
  42. ENSXMAP00000017412,anllUMD3.1IGK000011.2,49.28,69,35,0,73,141,650486,650280,4.00E-11,66.6
  43. ENSXMAP00000017436,anllUMD3.1IGK000015.2,60.53,76,30,0,31,106,49007223,49007450,1.00E-24,107
  44. ENSXMAP00000017436,anllUMD3.1IGK000015.2,52.13,94,41,2,16,106,48997867,48998145,6.00E-24,105
  45. ENSXMAP00000017436,anllUMD3.1IGK000015.2,57.89,76,32,0,31,106,49023242,49023469,1.00E-23,103
  46. ENSXMAP00000017436,anllUMD3.1IGK000015.2,56.58,76,33,0,31,106,49341123,49340896,7.00E-23,101
  47. ENSXMAP00000017436,anllUMD3.1IGK000015.2,56.58,76,33,0,31,106,49340178,49339951,7.00E-23,101
  48. ENSXMAP00000017436,anllUMD3.1IGK000015.2,56.58,76,33,0,31,106,49073678,49073905,7.00E-23,101
  49. ENSXMAP00000017436,anllUMD3.1IGK000015.2,57.53,73,31,0,34,106,49049276,49049494,6.00E-22,99
复制代码

论坛徽章:
0
发表于 2017-02-28 11:12 |显示全部楼层
回复 8# moperyblue

你好 请问这个代码是说明什么呢?
您需要登录后才可以回帖 登录 | 注册

本版积分规则

SACC2017购票6.8折优惠进行时

2017中国系统架构师大会(SACC2017)将于10月19-21日在北京新云南皇冠假日酒店震撼来袭。今年,大会以“云智未来”为主题,云集国内外顶级专家,围绕云计算、人工智能、大数据、移动互联网、产业应用等热点领域展开技术探讨与交流。本届大会共设置2大主会场,18个技术专场;邀请来自互联网、金融、制造业、电商等多个领域,100余位技术专家及行业领袖来分享他们的经验;并将吸引4000+人次的系统运维、架构师及IT决策人士参会,为他们提供最具价值的交流平台。
----------------------------------------
优惠时间:2017年8月2日前

活动链接>>
  

北京皓辰网域网络信息技术有限公司. 版权所有 京ICP证:060528号 北京市公安局海淀分局网监中心备案编号:1101082001
广播电视节目制作经营许可证(京) 字第1234号 中国互联网协会会员  联系我们:
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP