写文件是报错illegal multibyte sequence

石头怪 · 发表于 2017-6-7 17:50:20

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

import urllib.request
url = 'http://dzs.qisuu.com/txt/%s' %urllib.parse.quote('干物妹也要当漫画家.txt')
response = urllib.request.urlopen(url)

html = response.read().decode('gbk').encode('utf-8')

with open('11.txt','wb')as f:
f.write(html)

错误信息：Traceback (most recent call last):
File "C:\Users\memedai\Desktop\urlopen.py", line 5, in <module>
html = response.read().decode('gbk').encode('utf-8')
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 1128516: illegal multibyte sequence
>>>

newu · 发表于 2017-6-7 21:14:10

虽然没有学过pyton但是，可以百度下“ illegal multibyte sequence”
得出结论
https://www.baidu.com/baidu?wd=i ... e_4_dg&ie=utf-8

2012277033 · 发表于 2017-6-9 00:07:53

你要先确定你访问的那个网站是否是gbk编码的，确认好编码后你再decode.
我试着把gbk改成了iso8859-1，可以运行，但是得到的文件是乱码，所以，可能你要先确定一下网站原来的编码再进行decode
捕获.PNG

2012277033 · 发表于 2017-6-9 00:19:59

好吧，我试了下，你不用进行重新编码的，直接就可以用的，把那个decode和encode步骤去掉就可以了
你看捕获.PNG

import urllib.request
url = 'http://dzs.qisuu.com/txt/%s' %urllib.parse.quote('干物妹也要当漫画家.txt')
response = urllib.request.urlopen(url)
#html = response.read().decode('utf-8').encode('gb2312')
with open('11.txt','wb')as f:
f.write(response.read())

复制代码

1968609663 · 发表于 2017-6-9 07:03:04

先用 chardet 来确定你请求的那个网页的编码吧；

账号		自动登录	找回密码
密码			立即注册

写文件是报错illegal multibyte sequence

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块