|
|
发表于 2019-11-8 18:21:22
|
显示全部楼层
好吧,我搞不来这个,不晓得这样的报错该咋整- urllib.error.URLError: <urlopen error [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。>
复制代码 不晓得该怎样答复,只要解决了这种问题,我的代码你就能用,逻辑上是讲得通的
我的代码:- import re
- import random as rd
- import urllib.request
- import os
- import time
- #伪装
- headers = {'Connection':'keep-alive',
- 'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
- 'Accept-Encoding':'gzip, deflate, br',
- 'Accept':'image/webp,*/*',
- 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0'}
- ips = ['121.205.13.12:9999','60.184.173.0:3128','220.249.149.151:9999','113.121.23.62:9999','183.11.235.48:9292']
- 代理 = {'http':ips[rd.randint(0, len(ips)-1)]}
- prpxy_support = urllib.request.ProxyHandler(代理)
- opener = urllib.request.build_opener(prpxy_support)
- opener.addheaders = [('User-Agent', headers['User-Agent']),
- ('Connection',headers['Connection']),
- ('Accept-Language',headers['Accept-Language']),
- ('Accept-Encoding',headers['Accept-Encoding']),
- ('Accept',headers['Accept'])]
- #定制结束
- def download(url:'目标网址',path:'储存在哪个文件夹'):
- global opener
- html = opener.open(url)#打开网页
- html = html.read()
- rule = re.compile(r'<img data.+?src="(?P<img>.+?)">')
- photos = rule.findall(html)
- num = 1
- le = len(pho)
- while num<le:
- each = photo[num]
- try:
- #伪随机ip防止中断连接
- 代理 = {'http':ips[rd.randint(0, len(ips)-1)]}
- prpxy_support = urllib.request.ProxyHandler(代理)
- opener = urllib.request.build_opener(prpxy_support)
- pho = opener.open(each)
- tag = pho.read()
- with open(path+os.sep+str(num)+'.jpg','wb') as f:
- f.write(tag)
- num += 1
- except:
- print('连接断了,重试中')
- time.sleep(2)
- print('下载完了')
- if __name__ == '__main__':
- download('http://www.360doc.com/content/18/0702/12/46062816_767071220.shtml','测试')
复制代码 |
|