- 论坛徽章:
- 0
|
不好意思,又来麻烦您,您之前帮忙写的脚本非常好用,受益匪浅,我想在问您一下,要是在这两个文件事例中加一点文件限制条件,其它的匹配模式都不变,这样是不是直接加一些条件语句就好了?
例如:在id匹配上之后,读取文件b时对Identities = 49/56 (88%)即序列相似性大于90%才进行序列匹配输出,否则直接跳出忽略,
这样您写的程序我尝试修改了一下,结果功夫不到家,只好又来请教,
这是您上次写的,麻烦您在帮忙给看看
O(∩_∩)O谢谢
#!/usr/bin/perl
use strict;
use warnings;
sub collect_data{
my (%ha, %hFilt, $id);
open (my $fhA, '<', shift);
while (defined (local $_ = <$fhA> ){
if (/\A([a-z]\S+)/i){
%hFilt = () if (!exists $ha{$1});
$id = $1;
next;
}
push (@{$ha{$id}}, $1) if (/\A\s*(\d+)/ and !$hFilt{$1}++);
}
close ($fhA);
\%ha;
}
sub compare{
my ($id, $rh, @aData) = (@_);
my %hStat;
foreach my $v (@{$rh->{$id}}){
foreach my $V (grep {$v >= $_->[0] && $v <= $_->[2]} @aData){
my ($cnt, $leng) = (0, $v - $V->[0] + 1);
if ((local $_ = substr ($V->[1], 0, $leng)) =~ /-/){
$cnt = s/-//g;
$cnt += length ($1) if (substr ($V->[1], $leng - 2, 1) eq '-' and substr ($V->[1], $leng) =~ /\A(-+)/);
}
my @aChars = split (//, $V->[4]);
next if ($aChars[$v - $V->[0] + $cnt] eq '-');
$hStat{$aChars[$v - $V->[0] + $cnt]}++;
}
}
if (%hStat){
print "$id\t";
print "$_ hStat{$_}\t" for keys %hStat;
print "\n";
}
}
my $rha = collect_data (shift);
open (my $fh, '<', shift);
my ($Flag, $Offset, $ID, @aData) = 0;
while (defined (local $_ = <$fh> ){
next if (/\A\s*\z/ or !/\A(? uery|Sbjct)/);
if (/\AQuery=\h+(\S+)/){
compare ($ID, $rha, splice (@aData)) if (@aData);
exists ($rha->{$1}) ? do {($ID, $Flag, $Offset) = ($1, 1, -1)} : ($Flag = 0);
next;
}
next if (!$Flag);
if (/\A(Query|Sbjct)\h+(\d+)\h+(\H+)\h+(\d+)/){
push (@{$aData[$1 eq "Query" ? ++$Offset : $Offset]}, $2, $3, $4);
}
}
close ($fh);
compare ($ID, $rha, splice (@aData)) if ($Flag);
文件A:
WP_000636925.1-1
62
63
62
63
62
63
62
63
62
63
62
63
62
63
62
63
62
63
62
WP_000567766.1-1
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
77
79
79
79
80
80
80
80
80
80
WP_000636925.1-1
62
63
62
63
62
63
62
63
62
63
62
63
62
63
62
63
62
63
62
WP_001341795.1-1
24
24
24
24
24
文件B:
BLASTP 2.2.29+
Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.
Reference for composition-based statistics: Alejandro A. Schaffer,
L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri
I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
"Improving the accuracy of PSI-BLAST protein database searches with
Query= WP_001386572.1-1
Length=21
***** No hits found *****
Lambda K H a alpha
0.314 0.134 0.424 0.792 4.96
Gapped
Lambda K H a alpha sigma
0.267 0.0410 0.140 1.90 42.6 43.6
Effective search space used: 1206911076
Query= WP_000636925.1-1
Length=105
Score E
Sequences producing significant alignments: (Bits) Value
lcl|CU928161.2_prot_CAR03545.1_2173 [gene=ECS88_2260] [protein=c... 104 2e-29
>lcl|CU928161.2_prot_CAR03545.1_2173 [gene=ECS88_2260] [protein=conserved hypothetical protein] [protein_id=CAR03545.1]
[location=2240570..2240740]
Length=56
Score = 104 bits (260), Expect = 2e-29, Method: Compositional matrix adjust.
Identities = 49/56 (88%), Positives = 52/56 (93%), Gaps = 0/56 (0%)
Query 8 LLFTIMRSLYLKIRFPSDLRSINALNYLQVPSIDWSDSGYVQHFINVIEKMLTRKK 63
+LFTIMRSLYLK RFPSDL+SINALNYLQVPSIDWSDSGY+QHFINVIEKM T K
Sbjct 1 MLFTIMRSLYLKSRFPSDLKSINALNYLQVPSIDWSDSGYIQHFINVIEKMPTTNK 56
Lambda K H a alpha
0.325 0.135 0.407 0.792 4.96
Gapped
Lambda K H a alpha sigma
0.267 0.0410 0.140 1.90 42.6 43.6
Effective search space used: 1406358592
Query= WP_000636925.1-1
Length=105
Score E
Sequences producing significant alignments: (Bits) Value
lcl|CU928161.2_prot_CAR03545.1_2173 [gene=ECS88_2260] [protein=c... 104 2e-29
>lcl|CU928161.2_prot_CAR03545.1_2173 [gene=ECS88_2260] [protein=conserved hypothetical protein] [protein_id=CAR03545.1]
[location=2240570..2240740]
Length=56
Score = 104 bits (260), Expect = 2e-29, Method: Compositional matrix adjust.
Identities = 49/56 (88%), Positives = 52/56 (93%), Gaps = 0/56 (0%)
Query 8 LLFTIMRSLYLKIRFPSDLRSINALNYLQVPSIDWSDSGYVQHFINVIEKMLTRKK 63
+LFTIMRSLYLK RFPSDL+SINALNYLQVPSIDWSDSGY+QHFINVIEKM T K
Sbjct 1 MLFTIMRSLYLKSRFPSDLKSINALNYLQVPSIDWSDSGYIQHFINVIEKMPTTNK 56
Lambda K H a alpha
0.325 0.135 0.407 0.792 4.96
Gapped
Lambda K H a alpha sigma
0.267 0.0410 0.140 1.90 42.6 43.6
Effective search space used: 1406358592
回复 43# sunzhiguolu
|
|