[已解决]将网页内容保存成txt时的编码问题

云飘飘 · 发表于 2016-8-6 16:51:30

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

import urllib.request
import chardet
def main():
i=0
with open('urls.txt','r')as f:
urls=f.read().splitlines()
for each_url in urls:
response=urllib.request.urlopen(each_url)
html=response.read()
encode=chardet.detect(html)['encoding']
if encode=='GB2312':
encode='GBK'
i+=1
filename='url_%d.txt'%i
with open(filename,'w',encoding=encode)as each_file:
each_file.write(html.decode(encode,'ignore'))
if __name__=='__main__':
main()

复制代码

第22行是用unicode写入的，第21行encoding=encode
捕获.JPG

如果不设定encoding，默认的是unicode吧，那为啥要设定？
还有used to decode or encode the file是说输入一种code，这个文件在这种code和unicode之间随机选择吗？

最佳答案

月排行榜 / 总排行榜

SixPy

2016-8-6 18:13:26

本帖最后由 SixPy 于 2016-8-6 18:16 编辑

这话题，说3天3夜都说不完~
你自己去摆渡字符集和字符编码

encode 是把 unicode 转换为指定的编码格式，如 utf-8，gbk。。。。
decode 相反，是把指定的编码格式转换回 unicode

python3中，字符串只有 unicode一种编码， str
其他编码的只能作为字节对象存在。 bytes

跳转到最佳答案楼层

SixPy · 发表于 2016-8-6 18:13:26

本帖最后由 SixPy 于 2016-8-6 18:16 编辑

这话题，说3天3夜都说不完~
你自己去摆渡字符集和字符编码

encode 是把 unicode 转换为指定的编码格式，如 utf-8，gbk。。。。
decode 相反，是把指定的编码格式转换回 unicode

python3中，字符串只有 unicode一种编码， str
其他编码的只能作为字节对象存在。 bytes

账号		自动登录	找回密码
密码			立即注册