|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
站点地址
http://mm.xmeise.com/xingge/shunv/2634.html
下面是我写的代码:
#-*- coding:UTF-8 -*-;
import urllib.request;
import urllib.parse;
import json;
import os;
import urllib.error;
import http.client;
def webHttp(url, dataType=False, charset="UTF-8"):
req = urllib.request.Request(url);
req.add_header("User-Agent","Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36");
response = urllib.request.urlopen(req);
data = response.read();
if dataType == False:
try:
html = data.decode(charset);
except UnicodeDecodeError:
html = data.decode("GBK");
else:
html = data;
return html;
url = "http://mm.xmeise.com/xingge/shunv/2634.html";
html = webHttp(url);
print(html);
无论我拿GBK,GB2312,UTF-8解码都报如下错误:
Traceback (most recent call last):
File "E:/Python/抓美女/抓图片.py", line 17, in webHttp
html = data.decode(charset);
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:/Python/抓美女/抓图片.py", line 26, in <module>
html = webHttp(url);
File "E:/Python/抓美女/抓图片.py", line 19, in webHttp
html = data.decode("GBK");
UnicodeDecodeError: 'gbk' codec can't decode byte 0x8b in position 1: illegal multibyte sequence
但是改程序我抓取别的网站地址却正确,唯独就是这个站出现这个问题http://mm.xmeise.com/xingge/shunv/2634.html
实测用gbk来decode是可以的, - import requests
- req = requests.get('http://mm.xmeise.com/xingge/shunv/2634.html')
- print(req._content.decode('gbk'))
复制代码
|
|