- 论坛徽章:
- 0
|
关于文件抽取的问题,想请教一些诸位大神:
1.txt长这样(共9列,最后一列最长)
chr1 Cufflinks exon 840450 841059 . + . gene_id "XLOC_000003"; transcript_id "TCONS_00000014"; exon_number "1"; gene_name "RP11-54O7.16"; oId "TCONS_00000006"; nearest_ref "ENST00000607769.1_1"; class_code "i"; tss_id "TSS13";
chr1 Cufflinks exon 841182 843900 . + . gene_id "XLOC_000003"; transcript_id "TCONS_00000014"; exon_number "2"; gene_name "RP11-54O7.16"; oId "TCONS_00000006"; nearest_ref "ENST00000607769.1_1"; class_code "i"; tss_id "TSS13";
chrMT Mt_tRNA exon 577 647 . + . gene_id "ENSG00000210049"; transcript_id "ENST00000387314"; exon_number "1"; gene_name "MT-TF"; gene_biotype "Mt_tRNA"; transcript_name "MT-TF-201"; exon_id "ENSE00001544501";
目标文件就是判断针对每一个相同gene_id,如果exon_number >1, 则保留整个gene_id的相关信息,如果没有则删除;
结果如下(gene_id "ENSG00000210049" 这个id只有exon_number "1" ,没有出现exon_number "2"等等,则删除):
chr1 Cufflinks exon 840450 841059 . + . gene_id "XLOC_000003"; transcript_id "TCONS_00000014"; exon_number "1"; gene_name "RP11-54O7.16"; oId "TCONS_00000006"; nearest_ref "ENST00000607769.1_1"; class_code "i"; tss_id "TSS13";
chr1 Cufflinks exon 841182 843900 . + . gene_id "XLOC_000003"; transcript_id "TCONS_00000014"; exon_number "2"; gene_name "RP11-54O7.16"; oId "TCONS_00000006"; nearest_ref "ENST00000607769.1_1"; class_code "i"; tss_id "TSS13";
感觉自己没有表述清楚,还望大神理解
|
|