|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
运行之后就出现这个错误了...UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 1057: invalid start byte
这个是完整的错误信息:
Traceback (most recent call last):
File "F:/前主人的工作/test/blackwater.py", line 48, in <module>
saveimg()
File "F:/前主人的工作/test/blackwater.py", line 38, in saveimg
imgurl_list = geturl(url)
File "F:/前主人的工作/test/blackwater.py", line 13, in geturl
html = open_url(url).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 1057: invalid start byte
这个是源代码,我用这个爬百度贴吧是成功了的。。改了一下爬1688的就出现上面这个错误了~求指教~感谢!!~
import urllib.request
import os
def open_url(url) :
req = urllib.request.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36')
response = urllib.request.urlopen(url)
html = response.read()
return html
def geturl(url) :
html = open_url(url).decode('utf-8')
imgurl_list = []
a = html.find('<img src="')
while a != -1 :
b = html.find('.jpg', a, a+255)
if b != -1 :
imgurl_list.append(html[a+10: b+4])
else :
b = a + 10
a = html.find('<img src="', b)
return imgurl_list
'''
for i in imgurl_list :
print(i)
'''
def saveimg(file = 'WK') :
os.mkdir(file)
os.chdir(file)
url = 'https://item.taobao.com/item.htm?spm=a230r.1.14.18.58a078ebX7sZ7F&id=574781588980&ns=1&abbucket=16#detail'
#geturl(url)
imgurl_list = geturl(url)
img_len = len(imgurl_list)
for each in range(img_len) :
filename = imgurl_list[each].split('/')[-1]
with open(filename, 'wb') as f :
html = open_url(imgurl_list[each])
f.write(html)
if __name__ == '__main__' :
saveimg()
淘宝的编码是gbk的,是下楼上的办法。不行的话就是DOS窗口的问题。
改成这样:
- html = open_url(url).decode('gb18030')
复制代码
|
|