论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2009-09-27 15:33 |只看该作者 |倒序浏览

这个在help(re)中都能找到,和Perl比较了下，顺便做个笔记!
Perl中(perldoc perlre中查找Extended Patterns部分 for more detail):
(0) /i /s /m /g /o仅替换一次 /e替换之前计算表达式  /x 忽略模式中的空白
(1) (?pimsx-imsx) (-表示关闭)
/(?option)pattern/，等价于/pattern/option optin=i,s,m,x
(2) (?:string)
不存储括号内的匹配内容:/(?:a|b|c)(d|e)f\1/  \1匹配d|e
(?imsx-imsx:pattern)等价于(?:(?imsx-imsx)pattern)
(3) 肯定的后行预见匹配语法为/pattern(?=string)/,其意义为匹配后面为string的模式，
相反的(?!string)意义为匹配后面非string的模式
(4) 肯定的前行预见匹配语法为/(?pattern) (?'NAME'pattern)  命名分组  \k
或\k'NAME'取分组其实Perl也支持Python形式的命名分组(?Ppattern) (?P=NAME)
(7)  (?|pattern) 分支重设
# before  ---------------branch-reset----------- after
  / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
# 1          2       2  3       2    3    4
(8)  (?(condition)yes-pattern|no-pattern)
   (?(condition)yes-pattern) 同下面Python(10)的解释
一些特殊正则元字符例子(来自CU论坛):
回溯引用和前后查找:
i) 向前查找 (?=..)  　　　　　　　
  echo "ab2c121a" |perl -ne 'print $1 if /(.*?)(?=2)/;'  #print ab
ii)向后查找 (?  必须同时出现
  echo "xx"|perl -ne 'print $2  if /()?(\w*)(?(1))/'
#print  xx
echo "xx"|perl -ne 'print $2,"\n" if /()?(\w*)(?(1))/' #print 空
echo "xx"|perl -ne 'print $2 if /(;)?(\w*)(?(1))/'       #print xx
# ?()|  例如还是上面的，
# 当有可以接也可以接数字结尾
echo "xx1"|perl -ne 'print $2  if /(;)?(\w*)(?(1)|\d)/' #print xx
echo "xx1"|perl -ne 'print $2  if /(;)?(\w*)(?(1)|\d)/'    #print xx
Python中:
(除了实现Perl的这些扩展外，Python还有自己的扩展。若在'?'后紧跟的是P的话，则表示是Python的扩展)
(1). (?iLmsux) Set the I, L, M, S, U, or X flag for the RE 这个一般放在表达式的第一位
I  IGNORECASE  忽略大小写
L  LOCALE    Make \w, \W, \b, \B, dependent on the current locale.
M  MULTILINE "^"matches the beginning of lines (after a newline) as well as the beginning of the string.
               匹配字符串的开始或者每行的开始(这个指字符串中含有\n,将其看成"多行"字符串)

               "$"matches the end of lines (before a newline) as well as the end of the string.
               匹配字符串的结尾或者每行的结尾(这个指字符串中含有\n,将其看成"多行"的字符串)
S  DOTALL    匹配任何字符，包括换行符  ("单行"的字符串)
X  VERBOSE    忽略空格和注释
例如:
pattern = """
^                # beginning of string
M{0,4}             # thousands - 0 to 4 M's
(CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
                     #          or 500-800 (D, followed by 0 to 3 C's)
(XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
                     #       or 50-80 (L, followed by 0 to 3 X's)
(IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
                     #       or 5-8 (V, followed by 0 to 3 I's)
$                # end of string
"""
>>> re.search(pattern, 'M', re.VERBOSE)

   U  UNICODE    Make \w, \W, \b, \B, dependent on the Unicode locale.
(2)  (?:...)    不记录分组(即不能后向引用这个)
   例如:p=re.compile('(?:a|b|c)(d|e)')
         re.search(p,'adfg').groups() --->('d',)
(3)  (?P...) 命名分组.它可以通过MatchObject的方法group('name')得到，同时在表达式中也可以用(?P=name)来表示对它的引用。
(4)  (?P=name)    引用命名分组,若记录它还要再加一个()
例如: p = re.compile(r'(?P\b\w+\b).*(?P=word)')
(5）  (?#...)    注释
例如: p = re.compile(r'(\b\w+\b)(?#this is a comment)')
(6)  (?=...)  先行断言.断言某位置的后面能匹配这个表达式,最后的匹配结果不包括此字符串
(7)  (?!...)  非先行断言.断言某位置的后面不能匹配这个表达式,最后的匹配结果不包括此字符
(8)  (?
   例如:p=re.compile(r'Rui (?=Zhang)') p=re.compile(r'Rui (?!Zhang)')
      p=re.compile(r'(?
(10) (?(id/name)yes|no)  若分组id若name已经匹配,则使用yes,否则用no (no可选).另外若想记录这个匹配的话,还要加一个()
例如:p=re.compile(r'()'
p=re.compile(r'()?(\w*)(?(1)|\d)')
re.search(p,‘xx').groups() re.search(p,'xx1').groups()
正则表达式的优先级(从高到低!)
操作符                      描述
\                         转义符
(), (?:), (?=), []          圆括号和方括号
*, +, ?, {n}, {n,}, {n,m} 限定符
^, $, \anymetacharacter    位置和顺序
|                         "或"操作
找到一篇介绍比较详细的帖子:
http://daydayup.is-programmer.com/posts/1200.html

本文来自ChinaUnix博客，如果查看原文请点：http://blog.chinaunix.net/u2/65354/showart_2061223.html

文库|博客

返回列表

Chinaunix › 论坛 › 程序设计 › Python › Python文档中心 › Python之扩展的正则表达式

Python之扩展的正则表达式 [复制链接]