- 论坛徽章:
- 0
|
偶在mailing list上看到关于一个关于regex的经典问答,把它贴出来:
提问:
if ($text =~ /(.*?($crlf))2(.*)/sm) {
Do I read this right?
the '2' is a repeat character of the second match
where match 1 is (.*?$crlf) and
match 2 is $crlf ?
回答:
That depends; we don't know the contents of $crlf (although we can
probably guess). But if $crlf has classes and/or logic, interpolating
the variable again will match any of the possiblilities, where the
backreference will only match the literal string previously matched.
Consider the following:
~/perl> perl -e'$abc="[abc]"; print "matched!n" if "aa" =~ /($abc)1/'
matched!
~/perl> perl -e'$abc="[abc]"; print "matched!n" if "ab" =~ /($abc)1/'
but:
~/perl> perl -e'$abc="[abc]"; print "matched!n" if "ab" =~ /$abc$abc/'
matched!
再问:
~/perl> perl -e'$abc="[abc]"; print "matched!n" if "ab" =~ /$abc$abc/'
matched!
Why this happen?what is the difference between "/($abc)1/" and "
/$abc$abc/"?Thanks.
再回答:
when a part of a regex is stored in a vairable, the contents of the
variable are interpolated before the regex is evaluated, so when the
match is performed,
"ab" =~ /$abc/
becomes
"ab" =~ /[abc]/
By the same token
"ab" =~ /$abc$abc/
bcaomes
"ab" =~ /[abc][abc]/
each class can match a or b or c, and the entire regex will match aa,
bb, cc, ab, ac, ba, bc, ca, or cb.
with
/($abc)1/
however, 1 isn't evaluated until *after* the capturing parentheses do
their work, and 1 is replaced with whatever was captured--essentially
the value of $1 at whatever point the engine reaches that point in the
expression.
the variable is interpolated, so the expression becomes
/([abc])1/
then the engine begins evaluating the expression. As soon as the
parentheses capture something, the engine goes through and replaces 1
with the literal string captured.
In out example then, ([abc]) matches "a" and stores the value "a" in
$1. Then all occurances of 1 are replaced with "a".
/([abc])[abc]/
says "find me an a, b, or c, save it to $1,a dn find me another a, b, or c"
/([abc])1/
says "find me an a, b, or c, save it to $1, and then find me another
of whatever it is that was just found"
Of course, this only matters if the captured value is the result of
some logic or class operation.
/(ab)ab/
and
/(ab)1/
and
/(ab){2}/
are functionally equivalent, although the first one is more efficient
since it doesn't perform and capturing or substitution.
非常经典哈! |
|