- 论坛徽章:
- 0
|
RT,要分析出每IP的访问数量和每IP的访问内容。
61.4.184.92 - - [05/Dec/2013:00:10:05 +0800] "GET /data/?areaid=329103324&type=observe&date=201312042349&appid=7fde98&key=BUyFU0GyXhzGNDIFEDMQaortggDQ= HTTP/1.1" 200 76 "-" "Dalvik/1.6.0 (Linux; U; Android 4.1.1; MI 2S MIUI/JLB23.0)" -
61.4.184.91 - - [05/Dec/2013:00:10:05 +0800] "GET /data/?areaid=101243506&type=observe&date=201301012336&appid=7cfdf9&key=oMqeris3J3IZ3CHkbOKd06X5NYg= HTTP/1.1" 200 77 "-" "Dalvik/1.4.0 (Linux; U; Android 4.0; US900G Build/GRK39F)" -
如示例数据中
1统计出每个用户请求的数量,(用户由appid字段的值来区分)
2每个用户请求的areaid和type,
3每个IP访问的数量
每天的日志有5G或者更多,这样的数据怎么处理?
在shell版块问了,用awk.效率有点低,perl是不是会比较快一些?
awk的代码:
/appid/&&/areaid/&&/type/{
appid=gensub(/.*appid=([^&]*).*/,"\\1",1);
areaid=gensub(/.*areaid=([^&]*).*/,"\\1",1);
type=gensub(/.*type=([^&]*).*/,"\\1",1);
IP=$1;
a[appid]++;
if(!x[appid,areaid]++)b[appid]=b[appid]?b[appid]" "areaid:areaid;
if(!y[appid,type]++)c[appid]=c[appid]?c[appid]" "type:type;
d[IP]++;
}
END{
for(i in a)printf "appid:\t%s\ntimes:\t%d\nareaid:\t%s\ntype:\t%s\n\n",i,a[i],b[i],c[i];
for(i in d)print i,d[i];
} |
|