电梯直达

1楼 [收藏(0)] [报告]

发表于 2014-10-09 10:45 |只看该作者 |倒序浏览

本帖最后由 qianyemlf 于 2014-10-09 10:46 编辑

需要从中提取#start gene 到###之间的部分，感觉写的脚本很有问题，贴出来请大家指教啊

复制代码

# Predicted genes for sequence number 16416 on both strands
# (none)
#
# ----- prediction on sequence number 16417 (length = 195, name = 32833) -----
#
# Constraints/Hints:
# (none)
# Predicted genes for sequence number 16417 on both strands
# (none)
#
# ----- prediction on sequence number 16418 (length = 195, name = 32835) -----
#
# Constraints/Hints:
# (none)
# Predicted genes for sequence number 16418 on both strands
# start gene g327
32835 AUGUSTUS gene 1 195 0.59 - . g327
32835 AUGUSTUS transcript 1 195 0.59 - . g327.t1
32835 AUGUSTUS internal 1 195 0.59 - 0 transcript_id "g327.t1"; gene_id "g327";
32835 AUGUSTUS CDS 1 195 0.59 - 0 transcript_id "g327.t1"; gene_id "g327";
32835 AUGUSTUS exon 1 195 . - . transcript_id "g327.t1"; gene_id "g327";
# coding sequence = [cgggcgcggcatggtggtccacgggtgcgcgccgcagctggccgtgctggcccacccgtcggtgggcgccttcgtgtcg
# cactgcgggtggagctcggtgctggaggcggcgtccgccggcgtgccggtgctggcgtggccgctggtgttcgagcagttcatcaacgagcggctggt
# caccgaggtggtggcgtt]
# protein sequence = [RARHGGPRVRAAAGRAGPPVGGRLRVALRVELGAGGGVRRRAGAGVAAGVRAVHQRAAGHRGGGV]
# end gene g327
###
#
# ----- prediction on sequence number 16419 (length = 195, name = 32837) -----
#
# Constraints/Hints:
# (none)
# Predicted genes for sequence number 16419 on both strands
# (none)

复制代码

2楼 [报告]

发表于 2014-10-09 12:43 |只看该作者

#!/usr/bin/perl
use strict;
open (OUT,">out.gff3");
my $flag=0;
my $out=undef;
while(my $line=<DATA>){
if($line=~/^# start/){
$out.=$line;
$flag=1;
}elsif($line=~/^###/){
$flag=0;
}elsif($flag){
$out.=$line;
}
}
print OUT "$out";
__DATA__
# Predicted genes for sequence number 16416 on both strands
# (none)
#
# ----- prediction on sequence number 16417 (length = 195, name = 32833) -----
#
# Constraints/Hints:
# (none)
# Predicted genes for sequence number 16417 on both strands
# (none)
#
# ----- prediction on sequence number 16418 (length = 195, name = 32835) -----
#
# Constraints/Hints:
# (none)
# Predicted genes for sequence number 16418 on both strands
# start gene g327
32835 AUGUSTUS gene 1 195 0.59 - . g327
32835 AUGUSTUS transcript 1 195 0.59 - . g327.t1
32835 AUGUSTUS internal 1 195 0.59 - 0 transcript_id "g327.t1"; gene_id "g327";
32835 AUGUSTUS CDS 1 195 0.59 - 0 transcript_id "g327.t1"; gene_id "g327";
32835 AUGUSTUS exon 1 195 . - . transcript_id "g327.t1"; gene_id "g327";
# coding sequence = [cgggcgcggcatggtggtccacgggtgcgcgccgcagctggccgtgctggcccacccgtcggtgggcgccttcgtgtcg
# cactgcgggtggagctcggtgctggaggcggcgtccgccggcgtgccggtgctggcgtggccgctggtgttcgagcagttcatcaacgagcggctggt
# caccgaggtggtggcgtt]
# protein sequence = [RARHGGPRVRAAAGRAGPPVGGRLRVALRVELGAGGGVRRRAGAGVAAGVRAVHQRAAGHRGGGV]
# end gene g327
###
#
# ----- prediction on sequence number 16419 (length = 195, name = 32837) -----
#
# Constraints/Hints:
# (none)
# Predicted genes for sequence number 16419 on both strands
# (none)

复制代码

# start gene g327
32835 AUGUSTUS gene 1 195 0.59 - . g327
32835 AUGUSTUS transcript 1 195 0.59 - . g327.t1
32835 AUGUSTUS internal 1 195 0.59 - 0 transcript_id "g327.t1"; gene_id "g327";
32835 AUGUSTUS CDS 1 195 0.59 - 0 transcript_id "g327.t1"; gene_id "g327";
32835 AUGUSTUS exon 1 195 . - . transcript_id "g327.t1"; gene_id "g327";
# coding sequence = [cgggcgcggcatggtggtccacgggtgcgcgccgcagctggccgtgctggcccacccgtcggtgggcgccttcgtgtcg
# cactgcgggtggagctcggtgctggaggcggcgtccgccggcgtgccggtgctggcgtggccgctggtgttcgagcagttcatcaacgagcggctggt
# caccgaggtggtggcgtt]
# protein sequence = [RARHGGPRVRAAAGRAGPPVGGRLRVALRVELGAGGGVRRRAGAGVAAGVRAVHQRAAGHRGGGV]
# end gene g327

复制代码

紧急求助一个perl 多行匹配的问题 [复制链接]