- 论坛徽章:
- 5
|
回复 5# chenzhanyiczy
我又看了一遍那个帖子,我怎么老觉得他们讲的就是你问的这个意思呢?我现在把他们的话一段段翻译下。
之前你问的是不知道[..]怎么使用,因为再调用了[[.ae]]之后还是返回错误。
然后他们的帖子上lz提出了一个问题,他说:
In the glibc locale definitions, collating elements have a hyphenated name:
在glibc定义里面,collating element这个定义是有连接符'-'的。
collating-symbol <zs>
这种没有链接符的形式
collating-element <z-s> from "<U007A><U0073>"
这种有链接符的形式
and the hyphenated name have to be used in regular expression for [[. .]] to
work properly:
然后有链接符的名字的正则表达式工作很正常
$ echo '*ch*' | LC_COLLATE=cs_CZ.UTF-8 sed 's/[[.c-h.]]//'
**
这个工作很正常,sed确实把ch去掉了
$ echo 'ch' | LC_COLLATE=cs_CZ.UTF-8 sed 's/[[.ch.]]//'
sed: -e expression #1, char 12: Invalid collation character
没有链接符号的就工作不正常。
However, POSIX 1.2008 says:
A collating symbol is a collating element enclosed within
bracket-period ( "[." and ".]" ) delimiters. Collating
elements are defined as described in Collation Order .
Conforming applications shall represent multi-character
collating elements as collating symbols when it is
necessary to distinguish them from a list of the
individual characters that make up the multi-character
collating element. For example, if the string "ch" is a
collating element defined using the line:
collating-element <ch-digraph> from "<c><h>"
in the locale definition, the expression "[[.ch.]]" shall
be treated as an RE containing the collating symbol 'ch',
while "[ch]" shall be treated as an RE matching 'c' or
'h' . Collating symbols are recognized only inside
bracket expressions. If the string is not a collating
element in the current locale, the expression is invalid.
这段话解释了collating symbols和collating element的区别,一个collating symbol 等于一个collating element用[.和.]这样的分割符围起来的符号,恩。。感觉这两个没啥大区别阿?
POSIX especially mentions [[.ch.]] in the example instead of [[.ch-digraph.]] so
this is a bug in glibc. It shouldn't be hard to fix it in regcomp.
posix只在样例当中提到了[[.ch.]]的用法,而没有提到连字符'-'的用法,而且他在测试当中发现没有连字符的返回结果不正常,所以他认为这是一个bug。
然后一楼的人回复:
> POSIX especially mentions [[.ch.]] in the example instead of [[.ch-digraph.]]
> so this is a bug in glibc. It shouldn't be hard to fix it in regcomp.
The easiest fix would be, in my opnion, to rename all the
collation-element names for digraphs from their hyphenated
form to the non-hyphenated form. But a few users may have
gotten used to using the hyphented forms, working around
this bug in glibc. They would be pissed. So for quite a
while both forms will have to recognized. Attached patch
is an attempt to do this -- when the user specifies [.xx.],
it will first try to look up "xx" in the table of collation
elements, and when that fails, it will look up "x-x". Is
this what you had in mind, Paolo?
最简单的就是让代码里本来实现连字符格式的变成非连字符格式的。但是由于小部分用户可能习惯用连字符格式的了,要是把这些改了他们就遭殃了。所以我想让2种格式的都能兼容应该更好。我的附件的patch就是这么做的blabla...
然后他附件的代码的改动:
int32_t elem, idx;
2848 elem = seek_collating_symbol_entry (br_elem->opr.name,
2849 sym_name_len);
2850 if (symb_table[2 * elem] == 0 && sym_name_len == 2)
2851 {
2852 br_elem->opr.name[3] = '\0';
2853 br_elem->opr.name[2] = br_elem->opr.name[1];
2854 br_elem->opr.name[1] = '-';
2855 elem = seek_collating_symbol_entry (br_elem->opr.name, 3);
2856 }
2850 if (symb_table[2 * elem] != 0) 2857 if (symb_table[2 * elem] != 0)
2851 { 2858 {
2852 /* We found the entry. */ 2859 /* We found the entry. */
2850 到2856是他新增的行数,这里很清楚的表明了他增加了不带连字符格式的处理。
(Untested patch. I'm just majorly annoyed that collation elements
don't work as one would expect from the documenmtation.)
最后一层楼的人说:
Fixed in 2.18.
表明这个问题在2.18版本当中修复了。 |
|