论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2009-03-06 04:23 |只看该作者 |倒序浏览

我在pythinwin里编辑了一个utf_8.py。内容如下：
# -*- coding : utf-8 -*-
x = '哈'
print repr(x)
print x
保存后运行，输出结果为'\xb9\xfe' 哈。
我觉得奇怪的是：这个文件编辑和保存的编码是GBK，输出的'\xb9\xfe'也是'哈'的GBK编码，那么为什么用utf-8能通过编译呢？如果把utf-8去掉，则提示non-ascii的错，那么说明在编译时是使用utf-8了的。但是使用了的话，repr(x)应该输出'哈'的utf-8码'\xe5\x93\x88'啊。

如果把源文件改为：
x = '\xb9\xfe'
print repr(x)
print x
结果和上面一样，说明在编译时使用utf-8，将'哈'转成了'\xb9\xfe'。也就是说把GBK的'哈'，用utf-8再转成了'哈'的GBK码，我觉得很奇怪，有谁知道这个问题的答案吗？或者说我这个例子本身存在问题？

文库|博客

luffy.deng

腰缠万贯

论坛徽章:: 0

2楼 [报告]

发表于 2009-03-06 08:19 |只看该作者

http://www.python.org/peps/pep-0263.html

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

xiaomayi0323

家境小康

论坛徽章:: 0

3楼 [报告]

发表于 2009-03-06 09:56 |只看该作者

我在Ulipad的中的运行情况
1.
#coding=utf-8
x = '哈'
print repr(x)
print x

> "C:\Python25\pythonw.exe" -u "C:\Users\asus\Desktop\Untitled 1.py"
'\xe5\x93\x88'
\xe5\x93\x88

2.
#coding=gbk
x = '哈'
print repr(x)
print x

> "C:\Python25\pythonw.exe" -u "C:\Users\asus\Desktop\Untitled 1.py"
'\xb9\xfe'
哈

3.
# -*- coding : gbk -*-
x = '哈'
print repr(x)
print x

> "C:\Python25\pythonw.exe" -u "C:\Users\asus\Desktop\Untitled 1.py"
File "C:\Users\asus\Desktop\Untitled 1.py", line 2
SyntaxError: Non-ASCII character '\xb9' in file C:\Users\asus\Desktop\Untitled 1.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

4.
# -*- coding : utf-8 -*-
x = '哈'
print repr(x)
print x

> "C:\Python25\pythonw.exe" -u "C:\Users\asus\Desktop\Untitled 1.py"
File "C:\Users\asus\Desktop\Untitled 1.py", line 2
SyntaxError: Non-ASCII character '\xb9' in file C:\Users\asus\Desktop\Untitled 1.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

xiaomayi0323

家境小康

论坛徽章:: 0

4楼 [报告]

发表于 2009-03-06 09:58 |只看该作者

Ulipad中不识别
# -*- coding : gbk/utf-8 -*-

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

Levin_1221

白手起家

论坛徽章:: 0

5楼 [报告]

发表于 2009-03-06 11:27 |只看该作者

回复 #4 xiaomayi0323 的帖子

pythonwin中是识别#-*-cdoing:utf-8 -*-的。
难到是windoes下python的解析过程和linux下不同？

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

Levin_1221

白手起家

论坛徽章:: 0

6楼 [报告]

发表于 2009-03-06 11:29 |只看该作者

回复 #2 luffy.deng 的帖子

我看了这分说明的，它说明了分析和编译的过程：
1. read the file

   2. decode it into Unicode assuming a fixed per-file encoding

   3. convert it into a UTF-8 byte string

   4. tokenize the UTF-8 content

   5. compile it, creating Unicode objects from the given Unicode data
      and creating string objects from the Unicode literal data
      by first reencoding the UTF-8 data into 8-bit string data
      using the given file encoding
但是就是因为按它这么解释，无法说通我上面遇到的这个问题，所以才觉得很奇怪。