为什么requests的代理不能用，如果用urllib.request就行

幽梦三影 · 发表于 2018-7-10 07:47:51

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

import requests
import urllib.request
import random
import re
import os
def install():
users = [
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; Maxthon/3.0)",
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; QIHU 360EE)"
]
ips = [
"http://61.191.41.130",
"http://122.238.12.191",
"http://222.161.56.166"
]
ip = random.choice(ips)
user = random.choice(users)
return ip,user
def get_html(url):
num = install()
rep = requests.get(url,data={'User-Agent':num[1]},proxies={'https':num[0]},stream=True)
return rep
name = input('请输入要查找的商品：')
num = int(input('请输入页数：'))
key = urllib.request.quote(name)
os.mkdir(name)
os.chdir(name)
def main():
for j in range(num):
url = 'https://s.taobao.com/search?q=' + key + '&s='+ str(j*44)
html = get_html(url).text
s = re.findall(r'"pic_url":"([^"]+?)".+?"view_price":"([^"]+?)".+?"view_sales":"([^"]+?)"',html)
os.mkdir('%s%d'%(name,j+1))
os.chdir('%s%d'%(name,j+1))
for i in s:
img = get_html('http:' + str(i[0]))#在这获取html不能使用代理
print(img)
with open('%s元_%s.jpg'%(i[1],i[2]),'wb') as f:
f.write(img.content)
os.chdir(os.pardir)
if __name__ == '__main__':
main()

复制代码

幽梦三影 · 发表于 2018-7-10 09:31:38

iwanna 发表于 2018-7-10 09:29
可能是你ip是http，proxies里面写的是https
另外，user-angent是放在data里吗，我记得应该放在headers里{: ...

是headers，我弄错了

幽梦三影 · 发表于 2018-7-10 09:41:50

iwanna 发表于 2018-7-10 09:29
可能是你ip是http，proxies里面写的是https
另外，user-angent是放在data里吗，我记得应该放在headers里{: ...

爬一张图还行，多张图就不行了

幽梦三影 · 发表于 2018-7-10 10:02:06

iwanna 发表于 2018-7-10 09:58
不行是什么不行，报的什么错

raise ProxyError(e, request=request)
requests.exceptions.ProxyError: HTTPConnectionPool(host='222.161.56.166', port=80): Max retries exceeded with url: http://g-search1.alicdn.com/img/ ... !0-saturn_solar.jpg (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000021D734E6B38>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝，无法连接。'))

幽梦三影 · 发表于 2018-7-10 10:46:35

我没办法，只能这样了

import requests
import urllib.request as u
import random
import re
import os
def get_html(url):
user = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0"
ips = [
'222.161.56.166',
'122.238.12.191',
'61.191.41.130'
]
ip = random.choice(ips)
p = u.ProxyHandler({'http':ip})
opener = u.build_opener(p)
u.install_opener(opener)
rep = requests.get(url,headers={'User-Agent':user})
return rep
name = input('请输入要查找的商品：')
num = int(input('请输入页数：'))
key = u.quote(name)
os.mkdir(name)
os.chdir(name)
def main():
for j in range(num):
url = 'https://s.taobao.com/search?q=' + key + '&s='+ str(j*44)
html = get_html(url).text
s = re.findall(r'"pic_url":"([^"]+?)".+?"view_price":"([^"]+?)".+?"view_sales":"([^"]+?)"',html)
os.mkdir('%s%d'%(name,j+1))
os.chdir('%s%d'%(name,j+1))
for i in s:
img = get_html(url=('http:' + i[0]))
with open('%s元_%s.jpg'%(i[1],i[2]),'wb') as f:
f.write(img.content)
os.chdir(os.pardir)
if __name__ == '__main__':
main()

复制代码

账号		自动登录	找回密码
密码			立即注册

为什么requests的代理不能用，如果用urllib.request就行

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块