网页爬取时UnicodeDecodeError 问题
UnicodeDecodeError: 'gb2312' codec can't decode byte 0xc6 in position 9538: illegal multibyte sequence可是我的解码方式就是爬取网站的类型,并且程序在PythonIsta中可以正常运行解码网页,一旦在IDLE或Pycharm中运行就报错,为什么,有解决办法吗?
req.add_header('User-Agent:','Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.26 Mobile Safari/537.36')
proxy = random.choice(proxies)
proxy_support = urllib.request.ProxyHandler({'htpp':proxy})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
url = 'https://www.27270.com/ent/rentiyishu/2019/313823.html'
response = urllib.request.urlopen(url)
html = response.read().decode('gb2312')
print(html) 是不是和header有关系?好几次刚开始没问题,运行几次就报错 decode('gb2312','ignore')
页:
[1]