免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1216 | 回复: 0

[文本处理] 正则表达式求解 [复制链接]

论坛徽章:
1
申猴
日期:2014-02-11 14:50:31
发表于 2013-10-31 16:21 |显示全部楼层
本帖最后由 chenzhanyiczy 于 2013-10-31 16:22 编辑

man regex

       Within  a  bracket  expression, a collating element (a character, a multi-character sequence that collates as if it were a single character, or a collating-sequence name
       for either) enclosed in "[." and ".]" stands for the sequence of characters of that collating element.  The sequence is a single  element  of  the  bracket  expression’s
       list.  A bracket expression containing a multi-character collating element can thus match more than one character
, for example, if the collating sequence includes a "ch"
       collating element, then the RE "[[.ch.]]*c" matches the first five characters of "chchcc".


       Within a bracket expression, a collating element enclosed in "[=" and "=]" is an equivalence class, standing for the sequences of characters of  all  collating  elements
       equivalent  to  that  one,  including itself.  (If there are no other equivalent collating elements, the treatment is as if the enclosing delimiters were "[." and ".]".)
       For example, if o and ^ are the members of an equivalence class, then "[[=o=]]", "[[=^=]]", and "[o^]" are all synonymous.  An equivalence class may not(!)  be  an  end-
       point of a range.


一直没搞懂这个[..]是怎么用的?

"collating element" 如果看中文的man翻译就是"归并元素"

egrep、grep、sed、awk貌似都不支持。

以egrep来说:

regex.log:
0000ee0000ll000
0000ex0000ll000

egrep "[[.ee.]]" regex.log
egrep: Invalid collation character

egrep "[[.e.]]" regex.log
0000ee0000ll000
0000ex0000ll000

如果不支持多个字符的形式(即[.多个字符.]),那粗体部分描述的意思是什么?貌似就没有什么意义了。

ps:下面那段的"等价类"就更难理解了,更晦涩。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP