小强森 发表于 2019-6-11 15:49:14

新手爬虫求助urllib.request.urlretrieve(each,filename,None)报错

已经知道filename中已经取好了url地址:
filename=each.split("/")[-1]
      print(filename)
但是执行下句无法报存本地,报错
urllib.request.urlretrieve(each,filename,None)

报错信息
Traceback (most recent call last):
File "D:/pycharm/小甲鱼/2059正则1.py", line 23, in <module>
    get_img(open_url(url))
File "D:/pycharm/小甲鱼/2059正则1.py", line 19, in get_img
    urllib.request.urlretrieve(each,filename,None)
File "C:\Program Files\Python37\lib\urllib\request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Program Files\Python37\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
File "C:\Program Files\Python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
File "C:\Program Files\Python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
File "C:\Program Files\Python37\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
File "C:\Program Files\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
File "C:\Program Files\Python37\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Process finished with exit code 1


新手刚学爬虫,照猫画虎,不知什么问题,应该怎么解,求大神解惑!

wp231957 发表于 2019-6-11 16:35:34

拿代码说话

小强森 发表于 2019-6-11 16:44:41

import re
import urllib.request
def open_url(url): #获取地址
    req=urllib.request.Request(url)
    req.add_header('User-Agent',' Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36')
    page=urllib.request.urlopen(req)
    html=page.read().decode('utf-8')
    return html
def get_img(html):
    p=r'https://i.meizitu.net/thumbs/2019/06/18\d\d\d\d_\d\d.\d\d_236.jpg'
    imglist=re.findall(p,html)#还包含了很多没用信息

    for each in imglist:
      filename=each.split("/")[-1]#得到文件名
      urllib.request.urlretrieve(each,filename,None)

if __name__=='__main__':
    url='https://www.mzitu.com/'
    get_img(open_url(url))

小强森 发表于 2019-6-11 16:46:05

谢谢大神,代码在上面,辛苦了!@wp231957

wp231957 发表于 2019-6-11 16:56:57

小强森 发表于 2019-6-11 16:46
谢谢大神,代码在上面,辛苦了!@wp231957

妹子图网早都反爬了,小甲鱼的代码早就不能用了

wongyusing 发表于 2019-6-11 17:03:10

这样说吧,你的请求头缺少了refer参数。
导致服务器认为你无权访问,所以报403错误

小强森 发表于 2019-6-11 17:09:34

@wp231957@wongyusing 我看到我的所有的图片网址都已经报存到imglist里面了,是因为refer参数,或是反爬虫的原因吗?
页: [1]
查看完整版本: 新手爬虫求助urllib.request.urlretrieve(each,filename,None)报错