编码解码问题

两栖类 · 发表于 2018-7-19 13:36:55

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

import urllib.request
import chardet

def save(urls):
i=1
for name in urls:
      response=urllib.request.urlopen(name)
      html=response.read()
      str1=chardet.detect(html)['encoding']
      #print('该网页使用的编码是:%s'%str1)
      if str1=='GB2312':
         str1='GBK'
      elif str1=='UTF-8-SIG':
         str1='UTF-8'
      #ignore解码时忽略非法字符
      html=html.decode(str1,'ignore')
      #open的时候带一个encoding参数
      f=open("d:\\test\\url_%d.txt"%i,'w',encoding=str1)
      f.write(html)
      f.close()
      i+=1

def main():
f=open("d:\\test\\urls.txt")
urls=[]
while True:
      url=f.readline()
      url=url.strip('\n')
      if url=='':
         break
      urls.append(url)

f.close()
save(urls)

if __name__=='__main__':
main()

问题1：18行，open()里面加不加encoding=str1得到的文件内容都是一样的，这是为什么？
问题2：UTF-8-SIG编码怎么解码？可以直接用UTF-8解码吗？

两栖类 · 发表于 2018-7-19 19:20:25

？？怎么没人来

账号		自动登录	找回密码
密码			立即注册