python读取txt文件报错
open之后,read()时总是提示:Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 11: illegal multibyte sequence
请问是什么原因啊,和视频里讲的不一样,无法操作。请大神指导,谢谢啦 'gbk'无法解码
试试在open函数加上解码
f = open(xxx,encoding = 'utf-8') 使用cchardet判断字符编码(准确度高)
cchardet 比chardet准确度高,速度快。
检测文件编码
import cchardet as chardet
# 先检测出文件编码
with open("test.txt", "rb") as f:
msg = f.read()
enc = chardet.detect(msg) # 返回的是个字典 编码和准确度。如:{'encoding': 'UTF-8', 'confidence': 0.9900000095367432}
print(enc)
enc = enc['encoding']
# 然后以指定编码打开文件
with open("test.txt", "r", encoding=enc) as f:
print(f.read())
网页编码判断
import requests
import cchardet
res = requests.get('http://www.baidu.com/')
rawdata= res.content
enc = cchardet.detect(rawdata)
enc = enc['encoding']
print(enc) 逃兵 发表于 2021-10-22 10:05
'gbk'无法解码
试试在open函数加上解码
f = open(xxx,encoding = 'utf-8')
感谢指导,我是百度完发现txt可以选择存储格式,我换成ANSI了
页:
[1]