免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
12
最近访问板块 发新帖
楼主: elaine2017
打印 上一主题 下一主题

[文本处理] 文本处理 [复制链接]

论坛徽章:
0
11 [报告]
发表于 2018-05-16 09:34 |只看该作者
回复 8# blackold

真正的数据格式是这样的:这里展示了4行,每一行19列
E00548:177:HKH53CCXY:4:2101:31629:73229 ATGCGTACCACA    TACCAGCAGTTC    163     chr21   5013083 0       138M    =       5013132 187     CACACAGGCACACTACGTGCACACATACTCACACCACACACATACAGCCTTTTCTTCACACGTCTGAATCCTGATTGTCAGAGCAGCCACTTTTGGACTCAGCGGATGGGTCCCCTCTGGGGCTGATGGGCCGGGTGT      JJJJJJJJJJJJJJJJJFF<JJJJJJJJJJJJJJJJJJJJJJJJFJJFJJAJJJAJJJJJJJJJAJ<JJJJ<JFJFJFJJJJJJJJ<7J-777FFF-A7AFJFJJAJJ<JFJJJJJJJJJJJJJJ<JAA777FJJ-F)      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,-44250277,138M,0;    RG:Z:L004

E00548:177:HKH53CCXY:4:2214:23957:16516 TACCAGCAGTTC    ATGCGTACCACA    99      chr21   5013083 0       138M    =       5013132 187     CACACAGGCACACTACGTGCACACATACTCACACCACACACATACAGCCTTTTCTTCACACGTCTGAATCCTGATTGTCAGAGCAGCCACTTTTGGACTCAGCGGATGGGTCCCCTCTGGGGCTGATGGGCCGGGTGT      JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJFJF      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,-44250277,138M,0;    RG:Z:L004

E00548:177:HKH53CCXY:4:2101:31629:73229 ATGCGTACCACA    TACCAGCAGTTC    83      chr21   5013132 0       138M    =       5013083 -187    TTTTCTTCACACGTCTGAATCCTGATTGTCAGAGCAGCCACTTTTGGACTCAGCGGATGGGTCCCCTCTGGGGCTGATGGGCCGGGTGTTCCAGACACTTCCAGGGTTGGGGAGGGACGGCCACACCTCAGCCACAGG      7-FFJFFJJJFFF7F-F7FJJJJJJFAA-JJ7<JJJAJFA7-JJJFFJJFF<JJJAAJJJFFJJJJJJJJJJJJJJJFJJ<JJJFJJF7-JJJFFJJJFJJJJ<JJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,+44250228,138M,0;    RG:Z:L004

E00548:177:HKH53CCXY:4:2214:23957:16516 TACCAGCAGTTC    ATGCGTACCACA    147     chr21   5013132 0       138M    =       5013083 -187    TTTTCTTCACACGTCTGAATCCTGATTGTCAGAGCAGCCACTTTTGGACTCAGCGGATGGGTCCCCTCTGGGGCTGATGGGCCGGGTGTTCCAGACACTTCCAGGGTTGGGGAGGGACGGCCACACCTCAGCCACAGG      JJJJJJAJAFAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,+44250228,138M,0;    RG:Z:L004



我想要的是第11列,取绝对值后连着相同的数字是4或者8,即这里的187、187、-187、-187
然后用了你的代码:
$ awk 'NR==FNR{a[$11^2]++;next}a[$11^2]==4' whole_duplex_2.txt whole_duplex_2.txt|wc -l
0

结果为0?

论坛徽章:
5
2015年辞旧岁徽章
日期:2015-03-03 16:54:152015年迎新春徽章
日期:2015-03-04 09:50:282015年亚洲杯之朝鲜
日期:2015-03-13 22:47:33IT运维版块每日发帖之星
日期:2016-01-09 06:20:00IT运维版块每周发帖之星
日期:2016-03-07 16:27:44
12 [报告]
发表于 2018-05-16 09:39 |只看该作者
回复 11# elaine2017

我用你的数据测试是正常的。

可能是你的数据格式不对,dos格式? 自己检查吧。

论坛徽章:
0
13 [报告]
发表于 2018-05-16 09:39 |只看该作者
回复 5# wh7211

好像不太对啊,我数据一共有33万多行,最后找出来的结果只有130行?我把那一列单独提出来看了一下,肯定不止这么点

论坛徽章:
0
14 [报告]
发表于 2018-05-16 09:47 |只看该作者
回复 12# blackold

格式就是制表符分隔的,没错
要不你用这个试试?这里是22行,绝对值相等数连着是4或者8
E00548:177:HKH53CCXY:4:2101:31629:73229 ATGCGTACCACA    TACCAGCAGTTC    163     chr21   5013083 0       138M    =       5013132 187     CACACAGGCACACTACGTGCACACATACTCACACCACACACATACAGCCTTTTCTTCACACGTCTGAATCCTGATTGTCAGAGCAGCCACTTTTGGACTCAGCGGATGGGTCCCCTCTGGGGCTGATGGGCCGGGTGT      JJJJJJJJJJJJJJJJJFF<JJJJJJJJJJJJJJJJJJJJJJJJFJJFJJAJJJAJJJJJJJJJAJ<JJJJ<JFJFJFJJJJJJJJ<7J-777FFF-A7AFJFJJAJJ<JFJJJJJJJJJJJJJJ<JAA777FJJ-F)      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,-44250277,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:2214:23957:16516 TACCAGCAGTTC    ATGCGTACCACA    99      chr21   5013083 0       138M    =       5013132 187     CACACAGGCACACTACGTGCACACATACTCACACCACACACATACAGCCTTTTCTTCACACGTCTGAATCCTGATTGTCAGAGCAGCCACTTTTGGACTCAGCGGATGGGTCCCCTCTGGGGCTGATGGGCCGGGTGT      JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJFJF      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,-44250277,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:2101:31629:73229 ATGCGTACCACA    TACCAGCAGTTC    83      chr21   5013132 0       138M    =       5013083 -187    TTTTCTTCACACGTCTGAATCCTGATTGTCAGAGCAGCCACTTTTGGACTCAGCGGATGGGTCCCCTCTGGGGCTGATGGGCCGGGTGTTCCAGACACTTCCAGGGTTGGGGAGGGACGGCCACACCTCAGCCACAGG      7-FFJFFJJJFFF7F-F7FJJJJJJFAA-JJ7<JJJAJFA7-JJJFFJJFF<JJJAAJJJFFJJJJJJJJJJJJJJJFJJ<JJJFJJF7-JJJFFJJJFJJJJ<JJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,+44250228,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:2214:23957:16516 TACCAGCAGTTC    ATGCGTACCACA    147     chr21   5013132 0       138M    =       5013083 -187    TTTTCTTCACACGTCTGAATCCTGATTGTCAGAGCAGCCACTTTTGGACTCAGCGGATGGGTCCCCTCTGGGGCTGATGGGCCGGGTGTTCCAGACACTTCCAGGGTTGGGGAGGGACGGCCACACCTCAGCCACAGG      JJJJJJAJAFAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,+44250228,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:1223:12246:18239 GACAGGTCATAC    CTCTCCTATAGC    163     chr21   5055301 0       138M    =       5055323 160     CCCACCCAGGTGTCCATCCACCTGCCCTGGGGTCCCCTCCCCTCCTGTACCCTTGCAGCTCCCATGAGCCTCAGCCCCCTCCAAGCCCGTCTCCCTAGCAAAACCTTCCCTGAGACTCTAGCCCTTCCCTTTCTGCTG      JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJAJ<JJJJJF<FJFJJJJJFAJJJJJAJJJJFJJJJAJFJJJJJJJJJJFJJJJJJJJFFJJJAFFFFJJJJJJJJJJJJJJJJA      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,-44208234,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:2221:20872:34307 CTCTCCTATAGC    GACAGGTCATAC    99      chr21   5055301 0       138M    =       5055323 160     CCCACCCAGGTGTCCATCCACCTGCCCTGGGGTCCCCTCCCCTCCTGTACCCTTGCAGCTCCCATGAGCCTCAGCCCCCTCCAAGCCCGTCTCCCTAGCAAAACCTTCCCTGAGACTCTAGCCCTTCCCTTTCTGCTG      JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJFJJ<JJFJJJJJJJJJJJJJJJJJFJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,-44208234,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:2221:21836:35203 CTCTCCTATAGC    GACAGGTCATAC    99      chr21   5055301 0       138M    =       5055323 160     CCCACCCAGGTGTCCATCCACCTGCCCTGGGGTCCCCTCCCCTCCTGTACCCTTGCAGCTCCCATGAGCCTCAGCCCCCTCCAAGCCCGTCTCCCTAGCAAAACCTTCCCTGAGACTCTAGCCCTTCCCTTTCTGCTG      JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJAJJ<FJJFJJJFJFJJJJJJJJJFAFJJJJAJFJJJJJJJJJFFFJFFJJJJJJJAJAJJJJJJJJAJJJFJJJJFJJ<      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,-44208234,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:1223:12246:18239 GACAGGTCATAC    CTCTCCTATAGC    83      chr21   5055323 0       138M    =       5055301 -160    TGCCCTGGGGTCCCCTCCCCTCCTGTACCCTTGCAGCTCCCATGAGCCTCAGCCCCCTCCAAGCCCGTCTCCCTAGCAAAACCTTCCCTGAGACTCTAGCCCTTCCCTTTCTGCTGTGATAGGATCTTACCACATCTG      AJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,+44208212,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:2221:20872:34307 CTCTCCTATAGC    GACAGGTCATAC    147     chr21   5055323 0       138M    =       5055301 -160    TGCCCTGGGGTCCCCTCCCCTCCTGTACCCTTGCAGCTCCCATGAGCCTCAGCCCCCTCCAAGCCCGTCTCCCTAGCAAAACCTTCCCTGAGACTCTAGCCCTTCCCTTTCTGCTGTGATAGGATCTTACCACATCTG      AJJJJFJJJAFJJJJ<JJJJJJJJJJFJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,+44208212,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:2221:21836:35203 CTCTCCTATAGC    GACAGGTCATAC    147     chr21   5055323 0       138M    =       5055301 -160    TGCCCTGGGGTCCCCTCCCCTCCTGTACCCTTGCAGCTCCCATGAGCCTCAGCCCCCTCCAAGCCCGTCTCCCTAGCAAAACCTTCCCTGAGACTCTAGCCCTTCCCTTTCTGCTGTGATAGGATCTTACCACATCTG      )JJJJFJJAFFJJJA-JJJJJJJJJJJJJJJJJJJJJFJJJJJJFJJJJJJJJJJJJAJJJFJJJJJFJFJJJJJFJFFF-JJJJJJJJJJFFJJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJF      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,+44208212,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:1111:28798:6091  CACAAGTGTGGT    TGATACCGGACA    99      chr21   5057018 0       138M    =       5057025 145     TATAAAATGAACGCGCGTTCAAGATTTCCTTCAACTCATTGTTAGCGCAGAAACCGGTAAGATGTGCCAGCCAGGTCAAAGGAGAAGTGACAAAGGCACCTGTGTACGCGGAGTAAAGGGATACAGGTACGCTTCACA      JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,-44206517,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:2207:24444:55579 TGATACCGGACA    CACAAGTGTGGT    163     chr21   5057018 0       138M    =       5057025 145     TATAAAATGAACGCGCGTTCAAGATTTCCTTCAACTCATTGTTAGCGCAGAAACCGGTAAGATGTGCCAGCCAGGTCAAAGGAGAAGTGACAAAGGCACCTGTGTACGCGGAGTAAAGGGATACAGGTACGCTTCACA      JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJFJJJJJJJJJJJJJJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,-44206517,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:1111:28798:6091  CACAAGTGTGGT    TGATACCGGACA    147     chr21   5057025 0       138M    =       5057018 -145    TGAACGCGCGTTCAAGATTTCCTTCAACTCATTGTTAGCGCAGAAACCGGTAAGATGTGCCAGCCAGGTCAAAGGAGAAGTGACAAAGGCACCTGTGTACGCGGAGTAAAGGGATACAGGTACGCTTCACATACGAGG      FJFJJJJJJJJJJJJJJJJJJJJJJAAJAJJJJJJJJJJJJJJJFJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJAJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,+44206510,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:2207:24444:55579 TGATACCGGACA    CACAAGTGTGGT    83      chr21   5057025 0       138M    =       5057018 -145    TGAACGCGCGTTCAAGATTTCCTTCAACTCATTGTTAGCGCAGAAACCGGTAAGATGTGCCAGCCAGGTCAAAGGAGAAGTGACAAAGGCACCTGTGTACGCGGAGTAAAGGGATACAGGTACGCTTCACATACGAGG      7JF<JJJJJJJJJJJJJJJJJJJJJJJJJJAAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:138        XA:Z:chr21,+44206510,138M,0;    RG:Z:L004
E00548:177:HKH53CCXY:4:2102:4087:61837  TCAAGGAGAACC    AGCGGATGAGTA    163     chr21   5062947 48      138M    =       5062976 167     GGCTGGTCTCGAACTCCTGATCTCAGGTGATCTGCCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGCGCCACCGCACCCGGCATAAAAATTATTTCTTAATAACTCTTGTATTACTATCACAAAAGACTGA      JJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJA      NM:i:0  MD:Z:138        AS:i:138        XS:i:133        XA:Z:chr21,-44199064,138M,1;    RG:Z:L004
E00548:177:HKH53CCXY:4:2102:4087:61837  TCAAGGAGAACC    AGCGGATGAGTA    163     chr21   5062947 48      138M    =       5062976 167     GGCTGGTCTCGAACTCCTGATCTCAGGTGATCTGCCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGCGCCACCGCACCCGGCATAAAAATTATTTCTTAATAACTCTTGTATTACTATCACAAAAGACTGA      JJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJA      NM:i:0  MD:Z:138        AS:i:138        XS:i:133        XA:Z:chr21,-44199064,138M,1;    RG:Z:L004
E00548:177:HKH53CCXY:4:2110:5538:26273  AGCGGATGAGTA    TCAAGGAGAACC    99      chr21   5062947 48      138M    =       5062976 167     GGCTGGTCTCGAACTCCTGATCTCAGGTGATCTGCCTGCCTTGGCCTCCCAAAGTGCTGGGATTGCAGGCGTGCGCCACCGCACCCGGCATAAAAATTATTTCTTAATAACTCTTGTATTACTATCACAAAAGACTGA      JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ-<FJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJF      NM:i:1  MD:Z:64A73      AS:i:133        XS:i:128        XA:Z:chr21,-44199064,138M,2;    RG:Z:L004
E00548:177:HKH53CCXY:4:2110:5467:26431  AGCGGATGAGTA    TCAAGGAGAACC    99      chr21   5062947 48      138M    =       5062976 167     GGCTGGTCTCGAACTCCTGATCTCAGGTGATCTGCCTGCCTTGGCCTCCCAAAGTGCTGGGATTGCAGGCGTGCGCCACCGCACCCGGCATAAAAATTATTTCTTAATAACTCTTGTATTACTATCACAAAAGACTGA      JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ-<AJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJ      NM:i:1  MD:Z:64A73      AS:i:133        XS:i:128        XA:Z:chr21,-44199064,138M,2;    RG:Z:L004
E00548:177:HKH53CCXY:4:2102:4087:61837  TCAAGGAGAACC    AGCGGATGAGTA    83      chr21   5062976 48      138M    =       5062947 -167    ATCTGCCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGCGCCACCGCACCCGGCATAAAAATTATTTCTTAATAACTCTTGTATTACTATCACAAAAGACTGAGAAACCACAACGCTGTATGAAGTCCACTG      AJJJJFFJJJJJJJFJJJJJFFJJJJJAFJFJJJJJJJJJFFJJFFJJJJJJJFAAJJFJJJJJJFJJJJAJJJFJJAJF<F<JJJJJJJJFJJJJFJJJJJJJJJJJJJJJJJFFFF<JJJJFJJJJJFJJAJJFJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:133        XA:Z:chr21,+44199035,138M,1;    RG:Z:L004
E00548:177:HKH53CCXY:4:2102:4087:61837  TCAAGGAGAACC    AGCGGATGAGTA    83      chr21   5062976 48      138M    =       5062947 -167    ATCTGCCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGCGCCACCGCACCCGGCATAAAAATTATTTCTTAATAACTCTTGTATTACTATCACAAAAGACTGAGAAACCACAACGCTGTATGAAGTCCACTG      AJJJJFFJJJJJJJFJJJJJFFJJJJJAFJFJJJJJJJJJFFJJFFJJJJJJJFAAJJFJJJJJJFJJJJAJJJFJJAJF<F<JJJJJJJJFJJJJFJJJJJJJJJJJJJJJJJFFFF<JJJJFJJJJJFJJAJJFJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:133        XA:Z:chr21,+44199035,138M,1;    RG:Z:L004
E00548:177:HKH53CCXY:4:2110:5538:26273  AGCGGATGAGTA    TCAAGGAGAACC    147     chr21   5062976 48      138M    =       5062947 -167    ATCTGCCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGCGCCACCGCACCCGGCATAAAAATTATTTCTTAATAACTCTTGTATTACTATCACAAAAGACTGAGAAACCACAACGCTGTATGAAGTCCACTG      FJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:133        XA:Z:chr21,+44199035,138M,1;    RG:Z:L004
E00548:177:HKH53CCXY:4:2110:5467:26431  AGCGGATGAGTA    TCAAGGAGAACC    147     chr21   5062976 48      138M    =       5062947 -167    ATCTGCCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGCGCCACCGCACCCGGCATAAAAATTATTTCTTAATAACTCTTGTATTACTATCACAAAAGACTGAGAAACCACAACGCTGTATGAAGTCCACTG      JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJ      NM:i:0  MD:Z:138        AS:i:138        XS:i:133        XA:Z:chr21,+44199035,138M,1;    RG:Z:L004

论坛徽章:
5
2015年辞旧岁徽章
日期:2015-03-03 16:54:152015年迎新春徽章
日期:2015-03-04 09:50:282015年亚洲杯之朝鲜
日期:2015-03-13 22:47:33IT运维版块每日发帖之星
日期:2016-01-09 06:20:00IT运维版块每周发帖之星
日期:2016-03-07 16:27:44
15 [报告]
发表于 2018-05-16 09:52 |只看该作者
回复 14# elaine2017

不用浪费时间。

参考
http://bbs.chinaunix.net/forum.p ... p;highlight=problem

论坛徽章:
25
程序设计版块每日发帖之星
日期:2016-05-03 06:20:0015-16赛季CBA联赛之八一
日期:2018-07-05 10:34:09黑曼巴
日期:2018-07-06 15:19:5015-16赛季CBA联赛之佛山
日期:2018-08-03 13:19:3315-16赛季CBA联赛之山西
日期:2018-08-07 19:46:2315-16赛季CBA联赛之广夏
日期:2018-08-08 19:31:5015-16赛季CBA联赛之青岛
日期:2018-11-26 15:21:5015-16赛季CBA联赛之上海
日期:2018-12-11 09:45:3219周年集字徽章-年
日期:2020-04-18 23:54:5215-16赛季CBA联赛之深圳
日期:2020-04-19 21:40:19黑曼巴
日期:2022-04-03 17:55:1315-16赛季CBA联赛之八一
日期:2018-07-03 16:56:46
16 [报告]
发表于 2018-05-16 12:20 |只看该作者
本帖最后由 wh7211 于 2018-05-16 12:22 编辑

回复 13# elaine2017


你在1楼,11楼和14楼分别给出了不同的示例文本和需求:
1楼:文本中没有空行,第4列,取绝对值后连着相同的数字是4
11楼:文本中出现了空行,第11列,取绝对值后连着相同的数字是4或者8
14楼:文本中没有空行,第11列,取绝对值后连着相同的数字是4或者8
请一次性说清楚你所有的需求吧。

下面的代码处理第11列,取绝对值后连着相同的数字是4,并忽略文本中的空行:
  1. awk '/^$/{next}{a=sqrt($11*$11);if(!b[a]++){if(c==4){print d;d=""}else{d=""}};c=b[a];d=d?d"\n"$0:$0}END{if(c==4){print d}}' file
复制代码

论坛徽章:
0
17 [报告]
发表于 2018-05-16 12:32 |只看该作者
回复 16# wh7211

抱歉,没表达清楚,因为原始数据列数太多,我怕看着太复杂,所以提问的时候弄了个简化版的,至于是第4列还是第11列,我想着到时候自己可以稍微改一改,原数据中是没有空行的。

论坛徽章:
33
ChinaUnix元老
日期:2015-02-02 08:55:39CU十四周年纪念徽章
日期:2019-08-20 08:30:3720周年集字徽章-周	
日期:2020-10-28 14:13:3020周年集字徽章-20	
日期:2020-10-28 14:04:3019周年集字徽章-CU
日期:2019-09-08 23:26:2519周年集字徽章-19
日期:2019-08-27 13:31:262016科比退役纪念章
日期:2022-04-24 14:33:24
18 [报告]
发表于 2018-05-16 12:38 |只看该作者
回复 9# elaine2017


3楼的数据用4楼的代码测试没有发现问题,你的执行结果是什么?

论坛徽章:
25
程序设计版块每日发帖之星
日期:2016-05-03 06:20:0015-16赛季CBA联赛之八一
日期:2018-07-05 10:34:09黑曼巴
日期:2018-07-06 15:19:5015-16赛季CBA联赛之佛山
日期:2018-08-03 13:19:3315-16赛季CBA联赛之山西
日期:2018-08-07 19:46:2315-16赛季CBA联赛之广夏
日期:2018-08-08 19:31:5015-16赛季CBA联赛之青岛
日期:2018-11-26 15:21:5015-16赛季CBA联赛之上海
日期:2018-12-11 09:45:3219周年集字徽章-年
日期:2020-04-18 23:54:5215-16赛季CBA联赛之深圳
日期:2020-04-19 21:40:19黑曼巴
日期:2022-04-03 17:55:1315-16赛季CBA联赛之八一
日期:2018-07-03 16:56:46
19 [报告]
发表于 2018-05-16 14:03 |只看该作者
回复 13# elaine2017


好像不太对啊,我数据一共有33万多行,最后找出来的结果只有130行?我把那一列单独提出来看了一下,肯定不止这么点

1、原始文本转换成unix格式,执行『dos2unix file』
2、确定原始文本列数是否一致,执行『awk '{a++;if(NF==19){b++}}END{print a,b}' file』,其中a是行数,b是列数为19列的行数
3、确定原始文本第11列值是否正确
4、确定你提供的示例文本能完全涵盖原始文本的特征
5、5楼和14楼的代码,只处理“取绝对值后连着相同的数字是4”的情况
6、原始文本无空行,用5楼代码,原始文本有空行,用14楼代码
7、如果还觉得有问题,你做个简化版的文本,把结果贴出来看
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP