luofeiyu_cu 发表于 2014-05-22 06:47

这个文件如何读取?

Python 3.3.4 (v3.3.4:7ff62415e426, Feb 10 2014, 18:12:08) [MSC v.1600 32 bit (In
tel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> file="test"
>>> x1=open(file,"r",encoding="gb2132").read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: unknown encoding: gb2132
>>> x2=open(file,"r",encoding="gbk").read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa9 in position 69808: illega
l multibyte sequence
>>> x3=open(file,"r",encoding="utf-8").read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\python33\lib\codecs.py", line 301, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 0: invalid
start byte
>>>

test文件在这里,请看附件。

q1208c 发表于 2014-05-22 08:27

$ file 600005.Txt
600005.Txt: ISO-8859 text, with very long lines, with CRLF, CR, LF line terminators这文件已经是 8859的编码了.
页: [1]
查看完整版本: 这个文件如何读取?