新手爬虫求助urllib.request.urlretrieve(each,filename,None)报错
已经知道filename中已经取好了url地址:filename=each.split("/")[-1]
print(filename)
但是执行下句无法报存本地,报错
urllib.request.urlretrieve(each,filename,None)
报错信息
Traceback (most recent call last):
File "D:/pycharm/小甲鱼/2059正则1.py", line 23, in <module>
get_img(open_url(url))
File "D:/pycharm/小甲鱼/2059正则1.py", line 19, in get_img
urllib.request.urlretrieve(each,filename,None)
File "C:\Program Files\Python37\lib\urllib\request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Program Files\Python37\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Program Files\Python37\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Program Files\Python37\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Program Files\Python37\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Program Files\Python37\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Program Files\Python37\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Process finished with exit code 1
新手刚学爬虫,照猫画虎,不知什么问题,应该怎么解,求大神解惑!
拿代码说话 import re
import urllib.request
def open_url(url): #获取地址
req=urllib.request.Request(url)
req.add_header('User-Agent',' Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36')
page=urllib.request.urlopen(req)
html=page.read().decode('utf-8')
return html
def get_img(html):
p=r'https://i.meizitu.net/thumbs/2019/06/18\d\d\d\d_\d\d.\d\d_236.jpg'
imglist=re.findall(p,html)#还包含了很多没用信息
for each in imglist:
filename=each.split("/")[-1]#得到文件名
urllib.request.urlretrieve(each,filename,None)
if __name__=='__main__':
url='https://www.mzitu.com/'
get_img(open_url(url))
谢谢大神,代码在上面,辛苦了!@wp231957 小强森 发表于 2019-6-11 16:46
谢谢大神,代码在上面,辛苦了!@wp231957
妹子图网早都反爬了,小甲鱼的代码早就不能用了 这样说吧,你的请求头缺少了refer参数。
导致服务器认为你无权访问,所以报403错误 @wp231957@wongyusing 我看到我的所有的图片网址都已经报存到imglist里面了,是因为refer参数,或是反爬虫的原因吗?
页:
[1]