- 论坛徽章:
- 0
|
有一个文件里面有1亿多条的记录,现在打算通过shell命令进行排重。
文件大概1G多。放在了/dev/shm 内存里面进行操作。
因为不需要排序
shell 命令 用的是 time awk '{a[$0]++}END {for (b in a) print b}' all.txt > result.txt
时间用了
real 17m25.798s
user 4m51.860s
sys 0m56.204s
top 命令查看 发现CPU 并没有用满。24颗CPU只用了2颗。
Cpu0 : 65.1%us, 7.0%sy, 0.0%ni, 6.0%id, 7.0%wa, 2.3%hi, 12.6%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.0%us, 1.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 7.0%us, 6.0%sy, 0.0%ni, 83.4%id, 0.7%wa, 0.0%hi, 3.0%si, 0.0%st
Cpu13 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu16 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24676464k total, 12165268k used, 12511196k free, 4292k buffers
Swap: 65537128k total, 3203120k used, 62334008k free, 639556k cached
vmstat 的日志 截取一段
1 1 3148496 72432 1520 2010992 3604 0 3608 36 19218 14417 1 3 94 3 0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 1 3156200 74964 1520 2011452 3456 6284 3456 6284 23896 17482 1 1 95 3 0
1 2 3156200 73756 1412 2012688 2208 0 2208 516 23212 16970 1 1 95 3 0
0 1 3156200 72924 1296 2014832 3200 0 3200 4 20779 15540 1 2 94 3 0
0 3 3169980 76496 1292 2003332 1104 26336 1108 26336 20505 14623 0 2 91 6 0
0 1 3169980 75208 1300 2005348 2288 0 2288 16 25333 17701 0 1 96 3 0
0 1 3169980 75236 1300 2006952 3476 0 3476 0 25376 18388 0 1 96 3 0
0 1 3169980 73032 1320 2007964 3468 0 3468 152 18096 13705 1 3 94 3 0
2 0 3169980 72576 1312 2010360 3264 0 3264 0 24435 17557 1 1 95 3 0
0 1 3185528 74964 1312 2009920 2436 17336 2436 17336 23564 17556 1 2 95 3 0
0 1 3185528 74912 1312 2011908 3060 0 3060 0 25487 17752 1 1 96 3 0
0 1 3185528 75132 1320 2013412 3652 0 3652 16 25509 18001 1 1 96 3 0
1 1 3185528 74396 1324 2015852 3180 0 3180 40 23331 16731 1 1 95 3 0
2 0 3185528 72796 1324 2017584 3376 0 3376 0 19644 15047 1 2 94 3 0
0 1 3185528 72912 1324 2018688 3432 0 3432 0 25494 18311 1 1 96 3 0
1 0 3196468 74576 1320 2013060 3076 3384 3076 3384 23335 17313 1 2 95 3 0
0 1 3196468 72616 1324 2014172 3356 0 3356 4 19660 14426 1 2 94 3 0
0 1 3196468 72204 1328 2015852 3136 0 3136 16 23394 17322 1 2 95 3 0
1 1 3204036 75440 1328 2014432 2728 6168 2728 6168 24332 17497 0 1 95 3 0
1 1 3204036 74292 1332 2016176 2884 0 2884 16 22251 16466 1 2 95 3 0
0 1 3204036 73572 1328 2017252 3004 0 3004 0 22478 16262 1 2 95 3 0
0 1 3212884 76392 1344 2014596 2940 5252 2940 5388 20611 15664 1 2 94 3 0
该如何判断系统的瓶颈在什么地方?是i/o 么?
该如何进行改进 才能缩短排重的时间?
|
|