- 论坛徽章:
- 0
|
1、在你的日常工作中会使用正则表达式解决什么样的问题?
我用到的都是在实际场景下,正则只是其中的一点。
a) 以前公司的产品在处理日志时,需要匹配日志,解析日志中的变量,这个是最简单的,用到的正则匹配工具Regular.exe、RegexBuddy.exe、Regex Match Tracer- IDS事件:
- <82>IDSName:msensorgiga3;EventName:snmp_uservars:bad_commname;Count:1;SIP:10.28.3.99;1052;DIP:10.28.47.17;161;Time:2005-03-18 03:50:51;Type:^ce^b4^d6^aa;Severity:^d6^d0^b7^e7^cf^d5;Bad SNMP community name from 10.28.3.99 to 10.28.47.17
- <82>IDSName:msensorgiga3;EventName:dns_labels:binary;Count:1;SIP:10.28.8.100;15000;DIP:210.22.14.9;53;Time:2005-03-18 03:50:31;Type:^b9^a5^bb^f7;Severity:^d6^d0^b7^e7^cf^d5;10.28.8.100 -> 210.22.14.9 id 14080 DNS label contains binary data
- <82>IDSName:msensorgiga3;EventName:www2_uservars:unsafe_method;Count:1;SIP:10.28.70.7;4862;DIP:10.0.241.102;80;Time:2005-03-18 03:50:22;Type:^b9^a5^bb^f7;Severity:^d6^d0^b7^e7^cf^d5;10.28.70.7 -> 10.0.241.102: Unsafe method seen: POLL
- <82>IDSName:msensorgiga3;EventName:tftp_opcode;Count:1;SIP:10.28.7.96;15000;DIP:221.214.148.244;69;Time:2005-03-21 17:26:40;Type:探测;Severity:中风险;10.28.7.96 -> 221.214.148.244: Suspicious opcode in TFTP transfer
- <82>IDSName:msensorgiga3-4507;EventName:tftp_opcode;Count:1;SIP:10.28.4.154;57777;DIP:211.244.33.95;69;Time:2005-04-03 20:22:02;Type:探测;Severity:中风险;10.28.4.154 -> 211.244.33.95: Suspicious opcode in TFTP transfer
- <82>IDSName:Sensor-B;EventName:snmp_uservars:bad_commname;Count:1;SIP:132.194.68.102;60856;DIP:10.28.68.121;161;Time:2007-03-21 18:44:14;Type:未知;Severity:中风险;Bad SNMP community name from 132.194.68.102 to 10.28.68.121
- 判别规则:
- <\d+>IDSName:([^;]+);EventName:([^:]+):([^;]+);Count:(\d+);SIP:([^;]+);(\d+);DIP:([^;]+);(\d+);Time:([^;]+);Type:([^;]+);Severity:([^;]+);([^;]+)
- <\d+>IDSName:([^;]+);EventName:([^;]+);Count:(\d+);SIP:([^;]+);(\d+);DIP:([^;]+);(\d+);Time:([^;]+);Type:([^;]+);Severity:([^;]+);([^;]+)
复制代码 b) 自定义unix系统登录日志,其中需要获取登录用户名、IP等信息时,也只是简单的使用cut、grep、awk等。嘿嘿,不过以下代码的完整文档可是我的心血。感兴趣的兄弟自行研究。- # Add content in /etc/profile
- # Log "bash sh ksh" user login and command history
- up_client_ip=`(who am i|cut -d\( -f2|cut -d\) -f1)`
- if ( test -z "`echo $up_client_ip|awk '($1 ~/[0-9]+.[0-9]+.[0-9]+.[0-9]+/)'`" )
- then
- up_client_ip=`awk '/'$up_client_ip'/ {print $1}' /etc/hosts`
- fi
- up_nowtime=`(date +"%Y-%m-%d %T")`
- logger -p user.notice -- class=\"HOST_LOGIN\" type=\"2\" time=\"$up_nowtime\" src_ip=\"$up_client_ip\" dst_ip=\"192.168.100.90\" primary_user=\"\" secondary_user=\"`id|cut -d\( -f2|cut -d\) -f1`\" operation=\"\" content=\"login successful\" authen_status=\"Success\" log_level=\"1\" session_id=\"$$\" 2>/dev/null
- case "$0" in
- -bash)
- export PROMPT_COMMAND='logger -p user.notice -- class=\"HOST_COMMAND\" type=\"3\" time=\"`date +"%Y-%m-%d %T"`\" src_ip=\"$up_client_ip\" dst_ip=\"192.168.100.90\" primary_user=\"\" secondary_user=\"`id|cut -d\( -f2|cut -d\) -f1`\" operation=\"$(history 1 | { read x y; echo $y; })\" content=\"command\" authen_status=\"\" log_level=\"1\" session_id=\"$$\" 2>/dev/null;'
- ;;
- -ksh)
- function log2syslog
- {
- logger -p user.notice -- class=\"HOST_COMMAND\" type=\"3\" time=\"`date +"%Y-%m-%d %T"`\" src_ip=\"$up_client_ip\" dst_ip=\"192.168.100.90\" primary_user=\"\" secondary_user=\"`id|cut -d\( -f2|cut -d\) -f1`\" operation=\"`fc -ln -0`\" content=\"command\" authen_status=\"\" log_level=\"1\" session_id=\"$$\" 2>/dev/null;
- }
- trap log2syslog DEBUG;
- ;;
- esac
- readonly up_client_ip
- readonly up_nowtime
- readonly PROMPT_COMMAND
复制代码 c) 公司要分析统计WEB日志中一些攻击行为,然后就有了这么一个脚本。说明下,WEB日志是以IP为目录,目录下存放日志文件,日志文件名中包含日期。keywords文件定义关键字,server_ip定义要分析的WEB日志IP。- logdir=/var/log/netscaler
- analysedir=/var/www/html/seclog
- yesterday=`(date -d yesterday +"%Y-%m-%d")`
- today=`(date +"%Y-%m-%d")`
- function LOG_ANALYSE
- {
- cd $analysedir
- echo $SERVER_IP
- if [ ! -d $SERVER_IP ];
- then mkdir $SERVER_IP;
- fi
- if [ ! -d $SERVER_IP/$yesterday ];
- then mkdir $SERVER_IP/$yesterday;
- else rm -rf $SERVER_IP/$yesterday/*;
- fi
- for VALUE in `cat keywords |egrep -v "^$|^#"|awk -F"=" '{print $2}'`;
- do
- KEY=`grep "=$VALUE" keywords|egrep -v "^$|^#"|cut -d\= -f1`;
- grep -i "$VALUE" $logdir/$SERVER_IP/*$yesterday.log* >>$SERVER_IP/$yesterday/"$SERVER_IP"_"$KEY".result;
- done
- cd $SERVER_IP/$yesterday/
- awk '{print $3"\t"$9}' *.result >>analyse_"$yesterday"
- #sed -r 's/.* (\S+) \S+ HTTP \S+ \S+ \S+ (\S+) .*/\1 \2/'*.result >>analyse_"$yesterday"
- echo "url 独立IP数 pv">>count_"$yesterday"
- echo "--------------------------------------------------------------------">>count_"$yesterday"
- awk '{a[$2]++;if(!b[$2"_"$1]){b[$2"_"$1]=1;n[$2]++}}END{for(i in a) printf "%-45s %-20s %s\n",i,n[i],a[i]}' analyse_"$yesterday" | sort -k3n >>count_"$yesterday"
- echo "IP 访问URL数 次数">>count_"$yesterday"
- echo "--------------------------------------------------------------------">>count_"$yesterday"
- awk '{a[$1]++;if(!b[$1"_"$2]){b[$1"_"$2]=1;n[$1]++}}END{for(i in a) printf "%-45s %-20s %s\n",i,n[i],a[i]}' analyse_"$yesterday" | sort -k3n >>count_"$yesterday"
- cd $analysedir
- }
- for SERVER_IP in `cat $analysedir/server_ip|egrep -v "^$|^#"`;
- do LOG_ANALYSE;
- done
- cd $analysedir
- cat /dev/null >$analysedir/analyse_"$yesterday"_all
- cat /dev/null >$analysedir/count_"$yesterday"_all
- for SERVER_IP in `cat $analysedir/server_ip|egrep -v "^$|^#"`;
- do cat $analysedir/$SERVER_IP/$yesterday/analyse_"$yesterday" >>$analysedir/analyse_"$yesterday"_all;
- done
- echo "url 独立IP数 pv">>$analysedir/count_"$yesterday"_all
- echo "--------------------------------------------------------------------">>$analysedir/count_"$yesterday"_all
- awk '{a[$2]++;if(!b[$2"_"$1]){b[$2"_"$1]=1;n[$2]++}}END{for(i in a) printf "%-45s %-20s %s\n",i,n[i],a[i]}' analyse_"$yesterday"_all | sort -k3n >>$analysedir/count_"$yesterday"_all
- echo "IP 访问URL数 次数">>$analysedir/count_"$yesterday"_all
- echo "--------------------------------------------------------------------">>count_"$yesterday"_all
- awk '{a[$1]++;if(!b[$1"_"$2]){b[$1"_"$2]=1;n[$1]++}}END{for(i in a) printf "%-45s %-20s %s\n",i,n[i],a[i]}' analyse_"$yesterday"_all | sort -k3n >>$analysedir/count_"$yesterday"_all
复制代码 d) 检查日志中是否包括敏感信息,如信用卡号、身份证号等,写了个简单的脚本,就用grep、egrep。- ####################################################
- echo "log contain ID Number:"
- echo "-------------------------------------------"
- egrep -a "\b[0-9]{6}[1|2][8|9|0][0-9]{2}0[1-9][0-3][0-9][0-9]{3}[0-9x]\b|\b[0-9]{6}[1|2][8|9|0][0-9]{2}1[0-2][0-3][0-9][0-9]{3}[0-9x]\b|\b[0-9]{6}[0-9]{2}0[1-9][0-9]{5}\b|\b[0-9]{6}[0-9]{2}1[0-2][0-9]{5}\b|持卡人证件号" */*
- echo
- echo
- ####################################################
- echo "log contain Credit card number:"
- echo "-------------------------------------------"
- #egrep "4[0-9]{15}|4[0-9]{12}|5[1-5][0-9]{14}|6011[0-9]{12}|65[0-9]{14}|3[47][0-9]{13}|30[0-5][0-9]{11}|3[68][0-9]{12}|2131[0-9]{11}|1800[0-9]{11}|35[0-9]{3}[0-9]{11}" */*
- egrep -a "\b4[0-9]{15}\b|\b4[0-9]{12}\b|\b5[1-5][0-9]{14}\b|\b6011[0-9]{12}\b|\b65[0-9]{14}\b|\b3[47][0-9]{13}\b|\b30[0-5][0-9]{11}\b|\b3[68][0-9]{12}\b|\b2131[0-9]{11}\b|\b1800[0-9]{11}\b|\b35[0-9]{3}[0-9]{11}\b|BankCardNumber" */*
- echo
- echo
- ####################################################
复制代码 e) 检查后门,这个用的网上现成的代码。- #!/usr/bin/python
- #-*- encoding:UTF-8 -*-
- ###
- ## @package
- ##
- ## @author CFC4N <cfc4nphp@gmail.com>
- ## @copyright copyright (c) Www.cnxct.Com
- ## @Version $Id: check_php_shell.py 37 2010-07-22 09:56:28Z cfc4n $
- ###
- import os
- import sys
- import re
- import time
- def listdir(dirs,liston='0'):
- flog = open(os.getcwd()+"/check_php_shell.log","a+")
- if not os.path.isdir(dirs):
- print "directory %s is not exist"% (dirs)
- return
- lists = os.listdir(dirs)
- for list in lists:
- filepath = os.path.join(dirs,list)
- if os.path.isdir(filepath):
- if liston == '1':
- listdir(filepath,'1')
- elif os.path.isfile(filepath):
- filename = os.path.basename(filepath)
- if re.search(r"\.(?:php|inc|html?)$", filename, re.IGNORECASE):
- i = 0
- iname = 0
- f = open(filepath)
- while f:
- file_contents = f.readline()
- if not file_contents:
- break
- i += 1
- match = re.search(r'''(?P<function>\b(?:include|require)(?:_once)?\b)\s*\(?\s*["'](?P<filename>[^;]*(?<!\.(?:php|inc)))["']\)?\s*''', file_contents, re.IGNORECASE| re.MULTILINE)
- if match:
- function = match.group("function")
- filename = match.group("filename")
- if iname == 0:
- info = '\n[%s] :\n'% (filepath)
- else:
- info = ''
- info += '\t|-- [%s] - [%s] line [%d] \n'% (function,filename,i)
- flog.write(info)
- print info
- iname += 1
- match = re.search(r'\b(?P<function>eval|proc_open|popen|shell_exec|exec|passthru|system)\b\s*\(', file_contents, re.IGNORECASE| re.MULTILINE)
- if match:
- function = match.group("function")
- if iname == 0:
- info = '\n[%s] :\n'% (filepath)
- else:
- info = ''
- info += '\t|-- [%s] line [%d] \n'% (function,i)
- flog.write(info)
- print info
- iname += 1
- match = re.search(r'(^|(?<=;|=))\s*`(?P<shell>[^`]+)`\s*;', file_contents, re.IGNORECASE)
- if match:
- shell = match.group("shell")
- if iname == 0:
- info = '\n[%s] :\n'% (filepath)
- else:
- info = ''
- info += '\t|-- [``] command is [%s] in line [%d] \n'% (shell,i)
- flog.write(info)
- print info
- iname += 1
- f.close()
- flog.close()
- if '__main__' == __name__:
- argvnum = len(sys.argv)
- liston = '0'
- if argvnum == 1:
- action = os.path.basename(sys.argv[0])
- print "Command is like:\n %s D:\wwwroot\ \n %s D:\wwwroot\ 1 -- recurse subfolders"% (action,action)
- quit()
- elif argvnum == 2:
- path = os.path.realpath(sys.argv[1])
- listdir(path,liston)
- else:
- liston = sys.argv[2]
- path = os.path.realpath(sys.argv[1])
- listdir(path,liston)
- flog = open(os.getcwd()+"/check_php_shell.log","a+")
- ISOTIMEFORMAT='%Y-%m-%d %X'
- now_time = time.strftime(ISOTIMEFORMAT,time.localtime())
- flog.write("\n----------------------%s checked ---------------------\n"% (now_time))
- flog.close()
复制代码 2、正则表达式的用法在各个语言、脚本里面略有不同,你有什么学习经验可以分享给初学者?
a) 入门看《正则表达式30分钟入门教程》,看完基本的就会了,然后用我上面说的那几个工具试试,很容易上手;
b) 要想了解的深,就得有相应的场景使你可以去不断的研究那些语法。比如在上面提到的日志匹配,在实际过程中用到比较多的有界定(通过什么来界定位置,比如\b)、零宽断言、捕获、单行、多行、不匹配大小写等,这里就不详细说了,大家用不到。
c) 不同语言中的正则表达式会有所不同,比如java、perl、shell中可以使用的都会有不同,这个看相应的说明就行了。
d) 单纯的学习正则表达式不难,更多的是结合其他语言来完成特定的任务。
e) 测试正则表达式的效率可以使用RegexBuddy.exe看详细的匹配过程,匹配次数。一般来讲能明确的就明确,比如改用\d{1,3}就不用.*等;能限定的就限定,比如\w{1,3}就不用\w+,等等。网上也有一些资料。
f) 最好有实际场景来解答,这样才学得快。 |
|