- 论坛徽章:
- 0
|
两道题,问了N多人,没结果,再问一下看看
用awk写了一下第一题,
用了最笨得办法,效率超低。处理这个datafile用了10s,而且只适用处理短一些的串。。。
- BEGIN {
- _MIN_LEN=10;
- }
- function find_max_str()
- {
- for(i=length($0)/2;i>=_MIN_LEN;i--) {
- for(j=0;j<length($0)-i;j++) {
- k=substr($0,j,i);
- if((index(k," ")>0)||!((index(k,"A")>0 && index(k,"C")>0 && index(k,"G")>0 && index(k,"T")>0))) {
- continue;
- }
- if((a[k]++)==2) {
- printf("%d:%d:%d:%d:%s\n",NR,b[k],j,length(k),k);
- return 0;
- }
- else {
- b[k]=j;
- }
- }
- for(idx1 in a) {
- delete a[idx1];
- }
- for(idx2 in b) {
- delete b[idx2];
- }
- }
- return 1;
- }
- {
- find_max_str();
- }
复制代码
- time awk -f a.awk datafile
- 155:1:24:12:ACTGACTGACTG
- 156:1:33:31:AAAAAAAACCCCCCCCGGGGGGGGTTTTTTT
- real 9.3
- user 9.3
- sys 0.0
复制代码 |
|