- 论坛徽章:
- 0
|
一个文件
CD837627
32498177
CD816237
另外一个文件
>gi|258673925|gb|GT084725 BnE__cultivar_Surpass 400
>gi|258673924|gb|GT084724 BnE__cultivar_Surpass 400
>gi|258673923|gb|GT084723 BnE__cultivar_Surpass 400
>gi|258673922|gb|GT084722 BnE__cultivar_Surpass 400
>gi|258673921|gb|GT084721 BnE__cultivar_Surpass 400
>gi|258673920|gb|GT084720 BnE__cultivar_Surpass 400
>gi|258673919|gb|GT084719 BnE__cultivar_Surpass 400
>gi|258673918|gb|GT084718 BnE__cultivar_S
将第一个文件的内容挨个在第二个文件里面搜索。
我本想用$/将第一个文件挨个分割读取,再 $/将第二个文件也挨个分割读取,然后“=~” 搜索 可发现$/读取似乎乱了。不知道还有什么办法可以解决
下面是我出错的思路,虽然没有报错,但是在 -d里面 $gigb,也就是第二个文件在get_next_record($fh2) 不能读取内容请求大虾指正,万谢!
#!/usr/bin/perl
# parsing gi,Acc,organism,cultivator,sequence from the NCBI
use strict;
use warnings;
use FileHandle;
# Declare and initialize variables
my $fh1;
my $fh2;
my $gi;
my $gigb;
my $lib_gi = 'gigb.txt';
my $lib_gigb = 'grepgigb_tmp.txt';
# Get the name of the input
$fh1 = open_file($lib_gi);
print "waiting........\n";
# repeating
open (OUTFILE, ">output.txt");
while($gi = get_next_record($fh1) ) {
$fh2 = open_file($lib_gigb);
while( $gigb = get_next_record($fh2) ){
#Extract the fields of the record about gi,gb;
$gi =~ s/\n\ //g;
if($gigb =~ /$gi/) {
print OUTFILE "$gigb\n";
}
}
}
close (OUTFILE);
print "Work done! you can check the outfile in the output.txt\n";
exit;
###############################################
####################################
#Subroutine
###############################################
####################################
# open_file
#
# - given filename, set filehandle
sub open_file {
my($filename) = @_;
my $fh;
unless(open($fh, $filename)) {
print "Cannot open file $filename\n";
exit;
}
return $fh;
}
# get_next_record
#
# - given one record;
sub get_next_record {
my($fh) = @_;
my($offset);
my($record) = '';
my($save_input_separator) = $/;
$/ = "\n";
$record = <$fh>;
$/ = $save_input_separator;
return $record;
} |
|