- 论坛徽章:
- 0
|
本帖最后由 3227049 于 2010-04-04 10:44 编辑
正常的应用场景哪会那么简单
- #coding:utf-8
- html="版=&# 29256;=&# x7248;".replace(" ","");
- import re
- _=re.compile('&#(x)?([0-9a-fA-F]+);')
- to_str1=lambda s,charset='utf-8':_.sub(lambda result:unichr(int(result.group(2),result.group(1)=='x' and 16 or 10)).encode(charset) ,s)
- to_str2=lambda s,charset='utf-8':_.sub(lambda result:unichr(int(result.group(2),result.group(1)=='x' and 16 or 10)) ,s if type(s) is unicode else s.decode(charset))
- to_str3=lambda s:_.sub(lambda result:unichr(int(result.group(2),result.group(1)=='x' and 16 or 10)),s)
- print 'to_str1',to_str1(html)
- print '*'*80
- print 'to_str2',to_str2(html)
- print '*'*80
- print 'to_str3',to_str3(html)
- print '*'*80
复制代码 |
|