- 论坛徽章:
- 0
|
本帖最后由 qianyemlf 于 2014-10-09 10:46 编辑
需要从中提取#start gene 到###之间的部分,感觉写的脚本很有问题,贴出来请大家指教啊- open IN, "C:/Users/john/Desktop/augustus_out.gff3";
- open OUT, ">C:/Users/john/Desktop/augustus_out_c.gff3";
- while (<IN>) {
- if(/^(start)(.*)(###)$/sm){
- print OUT "$_\n";
-
- }
- }
复制代码- # Predicted genes for sequence number 16416 on both strands
- # (none)
- #
- # ----- prediction on sequence number 16417 (length = 195, name = 32833) -----
- #
- # Constraints/Hints:
- # (none)
- # Predicted genes for sequence number 16417 on both strands
- # (none)
- #
- # ----- prediction on sequence number 16418 (length = 195, name = 32835) -----
- #
- # Constraints/Hints:
- # (none)
- # Predicted genes for sequence number 16418 on both strands
- # start gene g327
- 32835 AUGUSTUS gene 1 195 0.59 - . g327
- 32835 AUGUSTUS transcript 1 195 0.59 - . g327.t1
- 32835 AUGUSTUS internal 1 195 0.59 - 0 transcript_id "g327.t1"; gene_id "g327";
- 32835 AUGUSTUS CDS 1 195 0.59 - 0 transcript_id "g327.t1"; gene_id "g327";
- 32835 AUGUSTUS exon 1 195 . - . transcript_id "g327.t1"; gene_id "g327";
- # coding sequence = [cgggcgcggcatggtggtccacgggtgcgcgccgcagctggccgtgctggcccacccgtcggtgggcgccttcgtgtcg
- # cactgcgggtggagctcggtgctggaggcggcgtccgccggcgtgccggtgctggcgtggccgctggtgttcgagcagttcatcaacgagcggctggt
- # caccgaggtggtggcgtt]
- # protein sequence = [RARHGGPRVRAAAGRAGPPVGGRLRVALRVELGAGGGVRRRAGAGVAAGVRAVHQRAAGHRGGGV]
- # end gene g327
- ###
- #
- # ----- prediction on sequence number 16419 (length = 195, name = 32837) -----
- #
- # Constraints/Hints:
- # (none)
- # Predicted genes for sequence number 16419 on both strands
- # (none)
复制代码 |
|