niuguohao 发表于 2018-06-28 15:43

求解 如何输出文件中某一列长度最长的行?

我有一个基因组文件,其中第一列是染色体名称,第十列是对应的序列,目的是输出 在第一列名称相同的条件下,第十列长度最长的行。

文件大致如下:

EAA1-10       0       scaffold1478    49534160      24S4359M126I10900M64I917M65I2319M       *       0       0       AATTGGAACGAAATAATTGGAACGAAATAATTGGAACGAAAT
EAA1-10       0       scaffold1478    49534160      24S4359M126I10900M64I917M65I2319M       *       0       0       AATTGGAACGAAATAATTGGAACGAAAT
EAA1-11       0       scaffold147    49534160      24S4359M126I10900M64I917M65I2319M       *       0       0       AATTGGAACGAAATAATTGGAACGAAATAATTGGAACGAAAT
EAA1-12       0       scaffold100    49534160      24S4359M126I10900M64I917M65I2319M       *       0       0       AATTGGAACGAAATAATTGGAACGAAATTGGAACGAAATAATGGAACGAAA
...

请求各位大神指教~

stanley_tam 发表于 2018-06-28 23:36

这样?{:yct10:}
#!perl6
use v6.c;

sub MAIN {
    my $input_file = 'a.txt';
    my %chromosome_hash = %();
    for $input_file.IO.open(:chomp).lines -> $line {
      my @parts = $line.split(/\s+/);
      my $name = @parts;
      my $current_seq_length = @parts.chars;

      if %chromosome_hash{$name}<length>:exists {
            my $seq_length = %chromosome_hash{$name}<length>;
            if $current_seq_length > $seq_length {
                %chromosome_hash{$name}<length> = $current_seq_length;
            }
      }
      else {
            %chromosome_hash{$name}<length> = $current_seq_length;
      }
      %chromosome_hash{$name}<line> = $line;
    }

    for %chromosome_hash.values.sort -> $value_hash {
      $value_hash<line>.say;
    }
}

niuguohao 发表于 2018-06-29 10:17

回复 2# stanley_tam

非常感谢!!
我大约看明白您的意思了,但是运行出现了报错:
Scalar value @parts better written as $parts at delete.pl line 10.
Scalar value @parts better written as $parts at delete.pl line 11.
"my" variable %chromosome_hash masks earlier declaration in same scope at delete.pl line 16.
"my" variable $name masks earlier declaration in same scope at delete.pl line 16.
"my" variable $current_seq_length masks earlier declaration in same scope at delete.pl line 16.
"my" variable %chromosome_hash masks earlier declaration in same scope at delete.pl line 20.
"my" variable $name masks earlier declaration in same scope at delete.pl line 20.
"my" variable $current_seq_length masks earlier declaration in same scope at delete.pl line 20.
"my" variable %chromosome_hash masks earlier declaration in same scope at delete.pl line 22.
"my" variable $name masks earlier declaration in same scope at delete.pl line 22.
"my" variable %chromosome_hash masks earlier declaration in same scope at delete.pl line 25.
Warning: Use of "values" without parentheses is ambiguous at delete.pl line 25.
syntax error at delete.pl line 7, near "%() "
syntax error at delete.pl line 8, near "$input_file."
Global symbol "%line" requires explicit package name at delete.pl line 8.
syntax error at delete.pl line 13, near "if %chromosome_hash"
syntax error at delete.pl line 13, near "<length>:"
Execution of delete.pl aborted due to compilation errors.


stanley_tam 发表于 2018-06-29 16:58

要安装perl6 {:qq23:}
https://perl6.org/downloads/

niuguohao 发表于 2018-07-13 09:16

回复 4# stanley_tam
我安装失败了,最后一步make install 这里总是有问题,请问大神如果用perl5的话,是不是要改这种中间带点的行啊{:qq22:}
for $input_file.IO.open(:chomp).lines -> $line {


b114213903 发表于 2018-08-17 11:20


my $Flag={};
my $Data={};
while(my $line=<DATA>){
        chomp;
        my($id,$seq)=(split(/\s+/,$line));
        my $length=length($seq);
        if($Flag->{$id}){
                if($length>$Flag->{$id}){
                        $Flag->{$id}=$length;
                        $Data->{$id}=$line;
                }
        }else{
                $Flag->{$id}=$length;
                $Data->{$id}=$line;
        }
}
foreach my $id (sort keys %{$Data}){
        print "$Data->{$id}\n";
}


__DATA__
EAA1-10       0       scaffold1478    49534160      24S4359M126I10900M64I917M65I2319M       *       0       0       AATTGGAACGAAATAATTGGAACGAAATAATTGGAACGAAAT
EAA1-10       0       scaffold1478    49534160      24S4359M126I10900M64I917M65I2319M       *       0       0       AATTGGAACGAAATAATTGGAACGAAAT
EAA1-11       0       scaffold147    49534160      24S4359M126I10900M64I917M65I2319M       *       0       0       AATTGGAACGAAATAATTGGAACGAAATAATTGGAACGAAAT
EAA1-12       0       scaffold100    49534160      24S4359M126I10900M64I917M65I2319M       *       0       0       AATTGGAACGAAATAATTGGAACGAAATTGGAACGAAATAATGGAACGAAA


ramsay_dan 发表于 2018-08-18 19:42

1 #!/usr/bin/perl -w
2 use strict;
3 use 5.010;
4
5 open my $IN_1 , '<' , "orange_in_1.file" or die "can not read! $!\n";
6 open my $OUT, '>' , "orange_out.file"or die "can not write! $!\n";
7
8 my %hash;
9 my @line;
10 while (<$IN_1>) {
11 chomp;
12 @line = split /\s+/;
13 if (exists $hash{$line}) {
14   if (length ($hash{$line}->) < length ($line)) {
15   $hash{$line}-> = $line;
16   $hash{$line}-> = $line;
17   $hash{$line}-> = $line;
18   $hash{$line}-> = $line;
19   $hash{$line}-> = $line;
20   $hash{$line}-> = $line;
21   $hash{$line}-> = $line;
22   $hash{$line}-> = $line;
23   $hash{$line}-> = $line;
24 }
25 } else {$hash{$line} = [$line, $line, $line, $line, $line, $line, $line, $line, $line];}
26 }
27
28 map {print $OUT "$_ $hash{$_}-> $hash{$_}-> $hash{$_}-> $hash{$_}-> $hash{$_}-> $hash{$_}-> $hash{$_}-> $hash{$_}-> $hash{$_}->\n";} sort keys %hash;
运行结果:见附件图片

ramsay_dan 发表于 2018-08-18 20:24

回复 6# b114213903

学习学习!

灿烂小猪 发表于 2019-01-02 16:19

回复 6# b114213903

:victory:
页: [1]
查看完整版本: 求解 如何输出文件中某一列长度最长的行?