Python FAQ 003 读取文件却报编码错误,Python交流,编程语言专区,鱼C论坛

zltzlt 发表于 2020-3-3 18:17:05

Python FAQ 003 读取文件却报编码错误

本帖最后由一个账号于 2020-3-18 21:22 编辑

Python FAQ 003

问题

为什么我无法读取文件中的内容：

>>> f = open(r'E:\a.txt')
>>> print(f.read())
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
print(f.read())
UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

解答

这种问题一般都是因为文件存储时使用的编码和 Python 打开文件时默认使用的编码不相同而引起的。

Python 打开文件时默认使用的编码是 GBK（也就是 cp936）

一般将打开文件时使用的编码设置为 UTF-8 即可：

>>> f = open(r'E:\a.txt', encoding='utf-8') # 加上 encoding 参数
>>> print(f.read())
123123123123123

如果还是会报错，按以下步骤调整文件的编码：

[*]用记事本打开文件，并选择【文件】-->【另存为】
[*]在另存为时将右下角的 “编码” 选项调整为 UTF-8

这样就可以正常使用下面的代码打开了。

f = open(r'E:\a.txt', encoding='utf-8') # E:\a.txt 是文件路径，可以自行更改
print(f.read())

一个账号 发表于 2020-3-3 18:24:02

本帖最后由一个账号于 2020-3-3 18:26 编辑

补充一下：Python 默认是用 cp936 编码打开的

zltzlt 发表于 2020-3-3 18:24:33

本帖最后由 zltzlt 于 2020-3-3 18:25 编辑

一个账号发表于 2020-3-3 18:24
补充一下：Python 默认是用 GB936 编码打开的

错，是 GBK（cp936）

不过还是感谢补充

一个账号 发表于 2020-3-3 18:27:14

zltzlt 发表于 2020-3-3 18:24
错，是 GBK（cp936）

不过还是感谢补充

说错了，是 cp936：

>>> f = open("d:/test.txt")
>>> f
<_io.TextIOWrapper name='d:/test.txt' mode='r' encoding='cp936'>

一个账号 发表于 2020-3-7 16:35:32

不一定是 utf-8：

>>> f = open("d:/test.txt")
>>> f.read()
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xfe in position 0: illegal multibyte sequence
>>> f = open("d:/test.txt", encoding="utf-8")
>>> f.read()
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
f.read()
File "C:\Users\Angel\AppData\Local\Programs\Python\Python38\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfe in position 0: invalid start byte

页: [1]

鱼C论坛's Archiver

Python FAQ 003 读取文件却报编码错误