Chinaunix

标题: 正则匹配查找 [打印本页]

作者: youzhengyu    时间: 2018-03-29 10:31
标题: 正则匹配查找
Nov 18 15:06:57 192.168.10.111   *Nov 18 15:16:14: %LINK-3-UPDOWN: Interface GigabitEthernet 0/15, changed state to down.
Nov 18 15:06:57 192.168.10.111   *Nov 18 15:16:14: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet 0/15, changed state to down.
Nov 18 15:06:58 192.168.10.111   *Nov 18 15:16:14: %LINK-3-UPDOWN: Interface GigabitEthernet 0/11, changed state to down.
Nov 18 15:06:58 192.168.10.111   *Nov 18 15:16:14: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet 0/11, changed state to down.
Nov 18 15:06:58 192.168.10.111   *Nov 18 15:16:14: %LINK-3-UPDOWN: Interface GigabitEthernet 0/12, changed state to down.
Nov 18 15:06:58 192.168.10.111   *Nov 18 15:16:14: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet 0/12, changed state to down.
Nov 18 15:07:01 192.168.10.111   *Nov 18 15:16:17: %LINK-3-UPDOWN: Interface GigabitEthernet 0/11, changed state to up.
Nov 18 15:07:01 192.168.10.111   *Nov 18 15:16:17: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet 0/11, changed state to up.
Nov 18 15:07:04 192.168.10.111   *Nov 18 15:16:20: %LINK-3-UPDOWN: Interface GigabitEthernet 0/13, changed state to up.
需要输入关键吃查询出对应的整行条目




作者: youzhengyu    时间: 2018-03-29 10:32
import zipfile,os

def test():
    path = "/opt/2018-03-26.zip"
    if zipfile.is_zipfile(path):   
        f = zipfile.ZipFile(path, 'r')
        files = f.namelist()         
        print 'files name:', files
        iplog = raw_input("select log file : ")
        mess = f.open(iplog)     
        l = mess.read().split("\n")
        #l = mess.read()
        while(1):
            a = raw_input("please input : ")
            t = filter(lambda x: a in x.split(" "),l)   
            if(t):
                print "\n".join(t)        
            elif a == "quit":
                break
            else:
                print "don't match it!"
        #print 'mess:', mess.read()
        f.close()
    else:
        print 111111

   
if __name__ == "__main__":
    test()

作者: youzhengyu    时间: 2018-03-29 10:34
import zipfile,os

def test():
    path = "/opt/2018-03-26.zip"
    if zipfile.is_zipfile(path):   
        f = zipfile.ZipFile(path, 'r')
        files = f.namelist()         
        print 'files name:', files
        iplog = raw_input("select log file : ")
        mess = f.open(iplog)     
        l = mess.read().split("\n")
        #l = mess.read()
        while(1):
            a = raw_input("please input : ")
            t = filter(lambda x: a in x.split(" "),l)   
            if(t):
                print "\n".join(t)        
            elif a == "quit":
                break
            else:
                print "don't match it!"
        #print 'mess:', mess.read()
        f.close()
    else:
        print 111111


if __name__ == "__main__":
    test()



作者: youzhengyu    时间: 2018-03-29 10:39
回复 2# youzhengyu

个人小白,为了实现在压缩文件中读取对应ip的日志然后实现模糊查询,刚刚学习python正则使用不熟悉,实现不了模糊擦寻
作者: dahe_1984    时间: 2018-03-29 13:18
本帖最后由 dahe_1984 于 2018-03-29 13:19 编辑
  1.     with open(logname, 'r') as f:
  2.    
  3.     lines = f.readlines()  

  4.     for eachline in lines:
  5.         if("key" in eachline):
  6.             print(eachline)
复制代码



这个不用正则匹配,判断在不在里面就行了。如果文件很大,f.readlines() 可以按照chucksize读取
作者: youzhengyu    时间: 2018-03-29 15:43
回复 5# dahe_1984

按您说的方法,功能实现了,但是对于较大的文本文件检索速度实在太慢了,有可以优化的建议吗?
作者: dahe_1984    时间: 2018-03-29 16:28
上面的方法对于大文件肯定慢,因为一次把文件都加载到了内存。

下面的方法是别人写的,我没试过,你可以试试下面的方法:

  1. def method(filename):
  2.     """use memory mapping and regex"""
  3.     regex = re.compile(r'\(TEMPS CP :[ ]*.*S\)')
  4.     offset = max(0, os.stat(filename).st_size - 15000)
  5.     with open(filename, 'r') as f:
  6.         with contextlib.closing(mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_COPY, offset=offset)) as txt:
  7.             match = regex.search(txt)
  8.             if match:
  9.                 print match.group()
复制代码


作者: dahe_1984    时间: 2018-03-29 16:29
https://stackoverflow.com/questi ... y-to-grep-big-files
作者: dahe_1984    时间: 2018-03-29 18:05
$ time python test.py
2018-03-25 15:30:43.3183927|2018-03-25 15:30:47.3144751|53|2|0||460096508004246||4294967295|86|0|0||4173197834|0|1321726
474|460015362200001|460015362218771|165|0|0|0|255|255|41|11|12|7716200|4294967295|4082802691|4294967295|4294967295|42949
67295|0|0|0|4294967295|4294967295|0|0|13|137|1|3937|3949|3965|0|41|4294967295|4294967295|0|2750345762|149|905|4294967295
|4294967295|2750345762|2750345762|-1|-1||255|0|41|4294967295|4294967295|4294967295|4294967295|4294967295|4294967295|DFA2
EFFF938ED157A63DB3A35FFA0855|91724FC9C8DB0000A0F77F9BB7EA97AC|
2018-03-25 15:31:01.0656379|2018-03-25 15:31:07.2146508|53|2|0||460096508004246||4294967295|86|0|0||4173197834|0|1321726
474|460015362200001|460015362218771|165|0|0|0|255|255|41|11|12|7807197|4294967295|4082802691|4294967295|4294967295|42949
67295|0|0|0|4294967295|4294967295|0|0|13|230|1|6090|6102|6117|0|41|4294967295|4294967295|0|2750328015|246|1029|429496729
5|4294967295|2750328015|2750328015|-1|-1||255|0|41|4294967295|4294967295|4294967295|4294967295|4294967295|4294967295|441
3A281AF0A9B1BD4626671F12E69A8|4D54C214049B000007F5905DD76A819D|
2018-03-25 15:31:13.6364605|2018-03-25 15:31:17.8634221|53|2|0||460096508004246||4294967295|86|0|0||4189975050|0|1321726
474|460015362200001|460015362218771|165|0|0|0|255|255|41|11|12|7857227|4294967295|4082802691|4294967295|4294967295|42949
67295|0|0|0|4294967295|4294967295|0|0|15|358|1|4158|4180|4201|0|41|4294967295|4294967295|0|2750315444|371|1128|429496729
5|4294967295|2750315444|2750315444|-1|-1||255|0|41|4294967295|4294967295|4294967295|4294967295|4294967295|4294967295|6A9
488FCF8678499D8DACB5EE3B08833|110686CBB0F8000021741CC20125BC48|
2018-03-25 15:31:23.9351091|2018-03-25 15:31:28.0186655|53|2|0||460096508004246||4294967295|86|0|0||4173197834|0|1321726
474|460015362200001|460015362218771|165|0|0|0|255|255|41|11|12|7904743|4294967295|4082802691|4294967295|4294967295|42949
67295|0|0|0|4294967295|4294967295|0|0|15|190|1|4000|4018|4040|0|41|4294967295|4294967295|0|2750305145|201|982|4294967295
|4294967295|2750305145|2750305145|-1|-1||255|0|41|4294967295|4294967295|4294967295|4294967295|4294967295|4294967295|8CF6
E6A300000000E41C461000000000|8CA5AA9A58940000E49DEF4BC3E58A59|
2018-03-25 15:31:38.5038018|2018-03-25 15:31:42.5172268|53|2|0||460096508004246||4294967295|86|0|0||4189975050|0|1321726
474|460015362200001|460015362218771|165|0|0|0|255|255|41|11|12|7972459|4294967295|4082802691|4294967295|4294967295|42949
67295|0|0|0|4294967295|4294967295|0|0|12|152|1|3953|3964|3982|0|41|4294967295|4294967295|0|2750290577|166|932|4294967295
|4294967295|2750290577|2750290577|-1|-1||255|0|41|4294967295|4294967295|4294967295|4294967295|4294967295|4294967295|36BA
74BE595D652F02FCD6B8F96CD839|24EB5275EFC4000091BE0B5941D0EEBF|

real    0m4.956s
user    0m2.406s
sys     0m2.421s

$ du -sh todb-ZC_I_GBIUPS_PAGING_SERVICE_SDR-180325-15428-1521963014.dat
258M    todb-ZC_I_GBIUPS_PAGING_SERVICE_SDR-180325-15428-1521963014.dat


作者: youzhengyu    时间: 2018-03-30 14:45
回复 8# dahe_1984

感谢您,启发很大
作者: dahe_1984    时间: 2018-03-30 14:51
呵呵没事。我的硬盘是固态硬盘,几百M的文件就是读进来,硬盘的读写一般都很慢。
在固态硬盘上,几百M的文件用mmap似乎还没有直接读进来快。
作者: youzhengyu    时间: 2018-03-30 14:55
回复 11# dahe_1984

哈哈哈,我这环境大多都是普通硬盘,我得慢慢研究消化一下




欢迎光临 Chinaunix (http://bbs.chinaunix.net/) Powered by Discuz! X3.2