- 论坛徽章:
- 0
|
请教一个数据测试的问题
下面是针对 GB2312-80 的例子。 GBK 与此类似。 但 UTF-8 是变长字节表示法,
复杂很多, 有兴趣你就慢慢搞吧。
- # cat 1
- #!/bin/awk -f
- #
- # check the valid chinese name in Charset GB2312-80
- BEGIN {
- MAX1=247
- MIN1=176
- MAX21=254
- MAX22=249
- MIN2=161
- H=215
- FS="\""
- }
- {
- a=$4
- FS=" "
- cmd="echo "a" | od -A n -tu1"
- cmd | getline
- for ( i=1; i < NF; i+=2 ) {
- j=i + 1
- MAX2=MAX21
- if( $i == H )
- MAX2=MAX22
- if ( $i > MAX1 || $i < MIN1 || $j > MAX2 || $j < MIN2 || length(a) < 4 ) {
- print a " is invalid name at line " NR
- break
- }
- }
- FS="\""
- }
复制代码
- # cat data
- "32051210000110900","汪龙仙",51.6,"310225380501482"
- "32051210000110900","汪龙",51.6,"310225380501482"
- "32051210000110900","汪",51.6,"310225380501482"
- "32051210000110901","赵飞燕",51.7,"310225380501483"
- "32051210000110902","杨玉环",51.8,"310225380501484"
- "32051210000110900","罾?",51.9,"310225380501485"
复制代码
- # ./1 data
- 汪 is invalid name at line 3
- 罾? is invalid name at line 6
复制代码 |
|