- 论坛徽章:
- 0
|
本帖最后由 jyf1987 于 2013-05-31 15:59 编辑
hi, 我在gnu sed的手册上看到一段
\(regexp\)
Groups the inner regexp as a whole, this is used to:
Apply postfix operators, like \(abcd\)*: this will search for zero or more whole sequences of ‘abcd’, while abcd* would search for ‘abc’ followed by zero or more occurrences of ‘d’. Note that support for \(abcd\)* is required by POSIX 1003.1-2001, but many non-GNU implementations do not support it and hence it is not universally portable.
Use back references (see below).
所以就尝试在工作中使用这一特性
实际案例是 解析apachelog 从中提取特定统计uri的访问 我们有一个营销活动其有两种参与状态 是否参与与uri 的paramaters里是否有某个字段(multi_ads)有关 也就是说 有multi_ads=xx 这样的即位参与状态A 没有的即为参与状态B
实际使用sed来匹配的时候 发现 使用那个 \(regex\)* 并不能被 back references引用到
空口无凭,上演示- (
- echo "blahblah&a=1&aa=1&b=2&ee=1&c=3&d=4"
- echo "blahblah&a=2&aa=1&b=1&ee=1&c=3&d=4"
- echo "blahblah&a=3&aa=1&b=2&ee=1&c=3&d=4"
- echo "blahblah&a=4&aa=1&b=1&ee=1&c=3&d=4"
- echo "blahblah&a=5&aa=1&b=2&ee=1&c=3&d=4"
- echo "blahblah&a=6&aa=1&b=1&ee=1&d=4"
- echo "blahblah&a=7&aa=1&b=2&ee=1&c=3&d=4"
- ) | sed -n 's/^.\+a\=\([0-9]\+\).\+b\=1.*&\(c\=[0-9]\)\?.\+/\1 \2/p'
复制代码 奇怪的是 这种代码就可以- ( echo "a=1&b=2&c=3&d=4"; echo "a=3&b=2&c=3&d=4"; echo "a=3&b=2&d=4"; echo "a=1&b=4&c=3&d=4"; echo "a=1&b=2&d=4"; ) | sed -n 's/^a\=\([0-9]\+\).\+b\=2&\(c\=3\)\?.\+/\1 \2/p'
复制代码 |
|