- 论坛徽章:
- 1
|
本帖最后由 56836430 于 2013-05-31 04:10 编辑
我有两个文件
file1:
Scaffold_10015 Gmira_TNET11_REPET_TEs match_part 1997 2603 0.0
Scaffold_10015 Gmira_TNET11_REPET_TEs match 20118 22709 0.0
Scaffold_10016 Gmira_TNET11_REPET_TEs match 20118 21850 0.0
file2:
Scaffold10015 1800 - 0 0 CHH CTT
Scaffold10015 1999 - 0 0 CHH CTA
Scaffold10015 2400 - 0 0 CHH CAA
Scaffold10015 3000 - 0 0 CHH CAC
Scaffold10016 20013 + 0 0 CHH CAT
Scaffold10016 21760 - 0 6 CHH CAT
Scaffold10017 47000 - 0 6 CHH CTA
想得到以下结果,
Result
Scaffold10015 1800 - 0 0 CHH CTT other
Scaffold10015 1999 - 0 0 CHH CTA match_part
Scaffold10015 2400 - 0 0 CHH CAA match_part
Scaffold10015 3000 - 0 0 CHH CAC other
Scaffold10016 20013 + 0 0 CHH CAT other
Scaffold10016 21760 - 0 6 CHH CAT match
Scaffold10017 47000 - 0 6 CHH CTA other
其实就是file2中的第二列如果在file1的第四列和第五列之间,则在后面标注上match or match_part,如果不在范围内则标注为other。
我根据以前一位大牛帮忙写的另外一个awk程序进行了小小的修改,但是怎么也得到不正确的答案。。。
这是我修改的小程序,大家帮忙看看哪里出了问题:
awk 'NR==FNR{if (/(exon|intron)/) {sca[$1]++;str=$1 FS sca[$1];a[str]=$3;b[str]=$4;c[str]=$5;d[str]=$(NF-2)};next}
{t=0;if ($1 in sca) {for (i=1;i<=sca[$1];i++) if ($2>=b[$1 FS i]&&$2<=c[$1 FS i]) {print $0,d[$1 FS i],a[$1 FS i];t=1} }
else {print "Scaffold",$1,"is not existed"}
{if (!t) {print $0,"other","other"}}
}' file1 file2
|
|