|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 ddtufoer 于 2016-8-7 20:21 编辑
代码很简单,就是抓取一个网址的图片地址,然后生成列表,打印出来。但是总是报UnicodeDecodeError的错误。希望有热心的鱼友能帮忙解决。
- import urllib.request
- import os
- url='http://www.1300k.com/shop/goodsDetail.html?f_goodsno=215023276979'
- def url_open(url):
- req=urllib.request.Request(url)
- req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36')
- response=urllib.request.urlopen(url)
- read=response.read()
-
- return read
- print(url_open(url))
- if __name__=='__main__':
- html=url_open(url).decode('utf-8','ignore')
- img_addrs=[]
- a=html.find('scr=')
- while a!=-1:
- b=html.find('.jpg',a,a+255)
- if b!=-1:
- img_addrs.append(html[a+5:b+4])
- else:
- b=a+5
- a=html.find('scr=',b)
- for each in img_addrs:
- print(each)
复制代码
下面是报错:
Traceback (most recent call last):
File "E:/1.py", line 13, in <module>
html=url_open(url).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 0: invalid start byte
然后看了鱼友们的帖子,加入了gzip模块:
- import urllib.request
- import re, gzip, io
- url='http://www.1300k.com/shop/goodsDetail.html?f_goodsno=215023276979'
- req=urllib.request.Request(url)
- req.header=('Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36')
- response=urllib.request.urlopen(url)
- buf = io.BytesIO(response.read())
- gzip_f = gzip.GzipFile(fileobj=buf)
- content = gzip_f.read()
- print(content.decode("UTF-8"))
复制代码
下面是报错:- Traceback (most recent call last):
- File "E:/2.py", line 10, in <module>
- content = gzip_f.read()
- File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35-32\lib\gzip.py", line 274, in read
- return self._buffer.read(size)
- File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35-32\lib\gzip.py", line 461, in read
- if not self._read_gzip_header():
- File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35-32\lib\gzip.py", line 409, in _read_gzip_header
- raise OSError('Not a gzipped file (%r)' % magic)
- OSError: Not a gzipped file (b'\xc1\xa2')
复制代码 |
|