- 论坛徽章:
- 22
|
本帖最后由 Windows19 于 2017-06-27 10:01 编辑
回复 6# 本友会机友会摄友会
这样的: 处理100g文本 2G内存 效率也是关键
将文本里面数据用精确匹配方式 统计出字母串 数字串 重复次数 然后从多到少排序 整行打印出来
下面是测试文本 要精确匹配 字母串 数字串 计数 有颜色那些为代表字母串 数字串 例如其中这2行
kmerw/'\35/trt
;\';3K452;LRNJKJlk35h42hrh6876788768796564743drgdsgrdg
每行的关键字在全文中出现的次数统计(单行内出现相同字符串多次的 计为一次) 如果有多少算多少也可以
a.txt
-
- .mllm;ml;k'\;.\\;t.\t
- ,tge,gery,g ,y; le
- JHBBJHHBJBHJ
- lhkfnereef345E'\43\E RTY
-
- a a a a aa a a a a
- ertfertgfe
-
-
- 56345
- 34435435
- lhkfnereef/'34'534'\TRER\DR
- RGHRERGY
- 34543543
- a a a a aa
- rthgrtdgyrttg
- g g g g g g g
- 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
- ,+%(?>>._{{}$##[]\\`{@@"/<&~'[])
- /'.\.\/;\']][lhkfnereef
- /'\']'JRJLKLKWRNLK/'\.'][
- /'\\'/'\JRJLKLKWRNLK/''\;/'\
- /\']8465872/']\']
- /\\'\'<span style="background-color: darkorange;">kmerw</span>/\][
- 15458875224455
- 325541125553224*6321755+5085/535365/
- 45'\/\'r\'34''\jk32lhkfnereef456435345353ert/'\35[34\5;3et'\/.we\sert
- 456456456kmerw464564
- 456456kmerw56456456
- 45648684 55465545455
- 456646464 45645646546545546
- 56456546<span style="background-color: darkorange;">kmerw</span>'/\4\6
- 6876788768796564743.';etg]4ep
- 6876788768796564743FDGD/\\.'..G
- 6876788768796564743RDNGK35\;\45;\4te\r
- 6876788768796564743erte'.;3p4[]]koj k
- 6876788768796564743etg./er\';34]6;t.g;re
- <span style="background-color: red;"><span style="background-color: red;">6876788768796564743</span></span>ygtr/\'345[t;ert.;g
- 8465872/'\'\'/\/'
- :%>(()??,(+?(:%%>()?>>:%.>?())
- ;'l;l LK;35JO23JRJLKLKWRNLK/''\'34\6/43erstw46546
- ;\';3K452;LRNJKJlk35h42hrh6876788768796564743drgdsgrdg
- JKLHKHKJGHYJGHUJG
- JRJLKLKWRNLK/\.]\[
- dgytrhhjuuyttfgghh
- hkjhbkhKJHKJHKJjhJjJjJ
- hkkkhkhkjkhjhgkk
- kmerw/'\35/trt
- kmerw45';3\][5435
- kmerw462444'4]6[4433
- kmerw76545
- lhkfnereef
- lhkfnereef/'\76\FH TY\ UTR76 T67U567H
- lhkfnereefyftyu/'\J TUYRT75,76 ER6UY 5;7;RDT;\467
- vhgghhuhhj fggg.n
- w,er;lwekjhy4u2hkewhrft8465872poio LKJLuhu432r456dsy'
复制代码 正确结果 (统计出字母串 数字串 重复次1次 可以随意排序位置)
出来后的 应该是这样排序
log.txt
- kmerw462444'4]6[4433
- 56456546kmerw'/\4\6
- kmerw/'\35/trt
- kmerw45';3\][5435
- 456456kmerw56456456
- 456456456kmerw464564
- kmerw76545
- /\\'\'kmerw/\][
- 6876788768796564743RDNGK35\;\45;\4te\r
- 6876788768796564743erte'.;3p4[]]koj k
- 6876788768796564743.';etg]4ep
- 6876788768796564743etg./er\';34]6;t.g;re
- ;\';3K452;LRNJKJlk35h42hrh6876788768796564743drgdsgrdg
- 6876788768796564743ygtr/\'345[t;ert.;g
- 6876788768796564743FDGD/\\.'..G
- 45'\/\'r\'34''\jk32lhkfnereef456435345353ert/'\35[34\5;3et'\/.we\sert
- lhkfnereef/'34'534'\TRER\DR
- w,er;lwekjhy4u2hkewhrft8465872poio LKJLuhu432r456dsy'
- lhkfnereef345E'\43\E RTY
- lhkfnereef/'\76\FH TY\ UTR76 T67U567H
- lhkfnereef
- /'.\.\/;\']][lhkfnereef
- ;'l;l LK;35JO23JRJLKLKWRNLK/''\'34\6/43erstw46546
- lhkfnereefyftyu/'\J TUYRT75,76 ER6UY 5;7;RDT;\467
- ,tge,gery,g ,y; le
- JRJLKLKWRNLK/\.]\[
- /'\\'/'\JRJLKLKWRNLK/''\;/'\
- /'\']'JRJLKLKWRNLK/'\.'][
- g g g g g g g
- .mllm;ml;k'\;.\\;t.\t
- 8465872/'\'\'/\/'
- /\']8465872/']\']
- 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
- a a a a aa
- a a a a aa a a a a
- jd,.434;\2['lkmerw5465fghfdh
- 325541125553224*6321755+5085/535365/
- vhgghhuhhj fggg.n
- 456646464 45645646546545546
- 45648684 55465545455
- hkkkhkhkjkhjhgkk
- hkjhbkhKJHKJHKJjhJjJjJ
- dgytrhhjuuyttfgghh
- JKLHKHKJGHYJGHUJG
- 15458875224455
- rthgrtdgyrttg
- 34543543
- RGHRERGY
- 34435435
- 56345
- ertfertgfe
- JHBBJHHBJBHJ
- :%>(()??,(+?(:%%>()?>>:%.>?())
- ,+%(?>>._{{}$##[]\\`{@@"/<&~'[])
-
-
-
-
复制代码
看着写就行
poweshell只会一点点,也有mysql数据库
如果可以,我也愿意尝试一下
|
|