- 论坛徽章:
- 769
|
本帖最后由 rdcwayx 于 2014-02-12 13:24 编辑
回复 1# huang6894
两个假定:
1. c.[0-9]+应该为c.[0-9_]+
2. $2肯定是[ATCGN]+,不含其他无关字符- awk '$3~/c.[0-9_]+[ATCG]+>[ATCG]+/{$0=$0"\t"gensub(/.*>(.*)/,"\\1",1,$3);print;next}
- $3~/c.[0-9_]+del.*ins.*/{$0=$0"\t"substr($2,1,1)gensub(/.*ins(.*)/,"\\1",1,$3);print;next}
- $3~/c.[0-9_]+del[ATCGN_]+/{$0=$0"\t"substr($2,1,1);print;next}
- $3~/c.[0-9_]+dup[ATCGN]+/{$0=$0"\t"$2$2;print;next}
- $3~/c.[0-9_]+ins[ATCGN0-9]+/{$0=$0"\t"$2gensub(/.*ins(.*)/,"\\1",1,$3);print;next}' i
复制代码 结果,第3行有点不一样- T C c.1171C>T T
- TT C c.1171C>TT TT
- delN GA c.1191_1192delN G
- T G c.136delT G
- N GGGAAAGGAAGGCTA c.1227_1241delN>ins11N G11N
- A G c.1237dupN GG
- delN ATC c.1239dupATC ATCATC
- delN ATC c.1239insATC ATCATC
- delN CTATCTCCACGT c.1239_1250insN CTATCTCCACGTN
- delN TATC c.117delATCG>insAA TAA
复制代码 |
|