免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 3784 | 回复: 0
打印 上一主题 下一主题

useragent 快速读取相关问题请教。 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2010-07-26 09:22 |只看该作者 |倒序浏览
各位大虾,我最近做了一个简单的日志读取useragent的脚本,
但是很慢效率很低,

  1. #! /usr/bin/env python
  2. #coding=gbk
  3. import os,glob
  4. def clear(filename):
  5.     print "begin"
  6.     k=open(filename).readlines()
  7.     for i in k[::-1]:
  8.         #print li.count(i)
  9.         if k.count(i) > 1:   
  10.             k.remove(i)
  11.             print "清掉%s"%i
  12.     g=open("u-agent.txt","a+")
  13.     g.writelines(k)
  14.     g.close()
  15.    
  16.    
  17. def check(path):
  18.     fset=glob.glob(path)
  19.     for i in fset:
  20.         matchme = file(i)
  21.         print i,"handled"
  22.         li=[]
  23.         for line in matchme.readlines():
  24.             try:
  25.                 g=line.split(" ")[10]
  26.                 if g is not "-":
  27.                     #print g
  28.                     li.append(g+"\n")
  29.             except (IOError,ZeroDivisionError,IndexError),args:
  30.                 print 'has error:%s'%args
  31.         #print li
  32.         f=open("useragent.txt","a+")
  33.         f.writelines(li)
  34.         f.close()
  35.         clear("useragent.txt")
  36. if __name__=="__main__":
  37.     print " check for useragent!"
  38.     check(r".\log\*.log")
  39.    
复制代码
log内容如下

  1. #Software: Microsoft Internet Information Services 6.0
  2. #Version: 1.0
  3. #Date: 2010-07-17 15:00:00
  4. #Fields: date time cs-method cs-uri-stem cs-uri-query s-port c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status sc-bytes time-taken
  5. 2010-07-17 14:59:59 GET /manutd/index.aspx sid=&cin=82974&waped=2&gaid=Tm9raWFOb2tpYSBOODE%3d 80 117.136.16.74 NOKIANokia+N81/UCWEB7.1.0.42/28/999 http://3g.cn 302 0 0 553 1000
  6. 2010-07-17 14:59:59 GET /nba/NewsContent.aspx sid=1007119299193953713&waped=2&gaid=&nid=274087 80 221.131.143.50 - - 200 0 64 0 843
  7. 2010-07-17 14:59:59 GET /nba/NewsContent.aspx sid=03AA378C0C7&waped=2&gaid=Tm9raWE1ODAw&nid=273996 80 117.136.0.141 Nokia5800+XpressMusic/UCWEB7.2.2.51/50/800 http://sports.3g.cn/TopIndex.aspx?sid=03AA378C0C7&cin=83042&waped=2&gaid=Tm9raWE1ODAw 200 0 0 14970 7500
  8. 2010-07-17 14:59:59 GET /lottery/lotteryInterface/LotteryOpenChange.aspx time=2010-07-17%2023:01:24&lottery=57&cid=1&sid=1&imei=JS11279257799603145160937902&mob=|13265718840| 80 112.97.30.1 Nokia5300/2.0+(05.51)+Profile/MIDP-2.0+Configuration/CLDC-1.1++UNTRUSTED/1.0 - 200 0 64 0 46
  9. 2010-07-17 14:59:59 GET /nba/NewsContent.aspx sid=03AA3790483&ftp=155&nid=273744 80 203.208.60.82 SAMSUNG-SGH-E250/1.0+Profile/MIDP-2.0+Configuration/CLDC-1.1+UP.Browser/6.2.3.3.c.1.101+(GUI)+MMP/2.0+(compatible;+Googlebot-Mobile/2.1;++http://www.google.com/bot.html) - 200 0 0 13081 390
  10. 2010-07-17 14:59:59 GET /manutd/NewsContent.aspx spec=287&nid=447608&pn=3&npn=4 80 123.125.66.128 Baiduspider+(+http://www.baidu.com/search/spider.htm) - 302 0 64 0 31
  11. 2010-07-17 14:59:59 GET /nba/NewsContent.aspx sid=00AA3696B5E&waped=2&gaid=Tm9raWFOb2tpYSBOODE%3d&nid=274087&npn=4&mob=|13068901136| 80 112.97.30.1 NOKIANokia+N81/UCWEB7.1.0.42/28/800 - 200 0 64 0 218
  12. 2010-07-17 14:59:59 GET /nba/NewsContent.aspx sid=1006172036456954826&waped=2&gaid=Tm9raWE1ODAw&nid=273951&npn=2 80 119.141.209.237 Nokia5800+XpressMusic/UCWEB7.2.2.51/50/800 http://sports.3g.cn/nba/NewsContent.aspx?sid=1006172036456954826&waped=2&gaid=Tm9raWE1ODAw&nid=273951 200 0 0 12820 171
  13. 2010-07-17 14:59:59 GET /manutd/NewsContent.aspx spec=287&nid=438799&pn=4 80 123.125.66.119 Baiduspider+(+http://www.baidu.com/search/spider.htm) - 302 0 64 0 31
  14. 2010-07-17 14:59:59 GET /nba/NewsContent.aspx nid=273705&sid=023A21D7212&wid=&pz=&waped=3&gaid= 80 218.202.106.201 SonyEricssonZ558c/R4FA+Java/SEMC-Java/2.0+Profile/MIDP-2.0+Configuration/CLDC-1.1+UNTRUSTED/1.0 - 200 0 64 0 937
  15. 2010-07-17 14:59:59 GET /NBA/NewsContent.aspx sid=1007344915857254217&wid=&pz=&waped=2&gaid=Tm9raWExNjgwYw%3d%3d&nid=274094&npn=5 80 211.139.145.106 - - 200 0 0 7630 140
  16. 2010-07-17 14:59:59 GET /nba/NewsContent.aspx nid=273960&sid=03BA389A0F4&wid=&pz=&waped=2&gaid=Tm9raWE1MjMw 80 117.136.29.57 Nokia5230/UCWEB7.2.2.51/50/800 http://sports.3g.cn/nba/lakers/index.aspx?sid=03BA389A0F4&cin=83200&waped=2&gaid=Tm9raWE1MjMw 200 0 0 15097 6593
  17. 2010-07-17 14:59:59 GET /nba/NewsContent.aspx sid=1007396491105954217&waped=2&gaid=VU5UUlVTVEVE&nid=274087&npn=1&act=rest&t=92587 80 218.202.106.201 UNTRUSTED/1.0 - 200 0 64 0 5937
  18. 2010-07-17 14:59:59 GET /NewsContent.aspx nid=463669&sid=009A3899B8F&wid=&pz=&gaid=&waped= 80 211.138.100.173 - - 200 0 0 15144 6125
  19. 2010-07-17 14:59:59 GET /nba/NewsContent.aspx sid=1007315874395954617&waped=2&gaid=Tm9raWE3Mzcw&nid=273966&npn=2 80 211.140.16.2 - http://sports.3g.cn/nba/NewsContent.aspx?sid=1007315874395954617&waped=2&gaid=Tm9raWE3Mzcw&nid=273966 200 0 0 13776 93
  20. 2010-07-17 14:59:59 GET /nba/NewsContent.aspx sid=1007736589706255317&waped=2&gaid=&nid=273993&npn=1&act=rest 80 211.139.145.106 - - 200 0 0 20521 703
  21. 2010-07-17 14:59:59 GET /nba/index.aspx cin=21&sid=1007381359164254814&waped=2&gaid=TUFVSSBXQVAgQnJvd3Nlcg%3d%3d&rd=27&mob=|13112861246| 80 112.96.28.7 MAUI+WAP+Browser - 200 0 0 19932 296
  22. 2010-07-17 14:59:59 GET /NewsContent.aspx nid=463670&sid=028A26729D6&wid=&pz=&gaid=&waped= 80 221.179.8.50 - - 200 0 0 11178 171
  23. 2010-07-17 14:59:59 GET /nba/NewsContent.aspx sid=00AA30CFB7F&waped=2&gaid=Tm9raWFFNzE%3d&nid=273894 80 117.136.10.186 NokiaE71/UCWEB7.2.2.51/28/800 http://sports.3g.cn/TopIndex.aspx?sid=00AA30CFB7F&cin=83327&waped=2&gaid=Tm9raWFFNzE%3d 200 0 0 18206 3500
  24. 2010-07-17 14:59:59 GET /NewsContent.aspx sid=00EA31CDBC5&waped=2&gaid=T3BlcmE%3d&nid=463682&npn=2 80 59.151.106.240 Opera/9.80+(J2ME/MIDP;+Opera+Mini/4.2.20055/19.828;+U;+zh)+Presto/2.5.25 - 200 0 0 12823 62
  25. 2010-07-17 14:59:59 GET /nba/NewsContent.aspx sid=00AA389C344&waped=2&gaid=Tm9raWE1MjMz&nid=273894 80 117.136.21.138 Nokia5233/UCWEB7.3.0.55/50/999 http://sports.3g.cn/TopIndex.aspx?sid=00AA389C344&cin=83042&waped=2&gaid=Tm9raWE1MjMz 200 0 0 18029 3531
复制代码
希望各位能帮我改一下速度更快,并能避免我上面脚本出现的问题,多谢!
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP