鱼C论坛

 找回密码
 立即注册
查看: 7413|回复: 4

[已解决]用requests.get打开的时候报错

[复制链接]
发表于 2017-11-4 16:13:08 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
我用  res = requests.get("http://jobs.51job.com/all/co456898.html")

打开的时候出错了,换其他网址又不会报错,是这个网站有反爬机制吗


  1. Traceback (most recent call last):
  2.   File "E:\soft\Python36\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
  3.     chunked=chunked)
  4.   File "E:\soft\Python36\lib\site-packages\urllib3\connectionpool.py", line 387, in _make_request
  5.     six.raise_from(e, None)
  6.   File "<string>", line 2, in raise_from
  7.   File "E:\soft\Python36\lib\site-packages\urllib3\connectionpool.py", line 383, in _make_request
  8.     httplib_response = conn.getresponse()
  9.   File "E:\soft\Python36\lib\http\client.py", line 1331, in getresponse
  10.     response.begin()
  11.   File "E:\soft\Python36\lib\http\client.py", line 297, in begin
  12.     version, status, reason = self._read_status()
  13.   File "E:\soft\Python36\lib\http\client.py", line 258, in _read_status
  14.     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  15.   File "E:\soft\Python36\lib\socket.py", line 586, in readinto
  16.     return self._sock.recv_into(b)
  17. ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。

  18. During handling of the above exception, another exception occurred:

  19. Traceback (most recent call last):
  20.   File "E:\soft\Python36\lib\site-packages\requests\adapters.py", line 440, in send
  21.     timeout=timeout
  22.   File "E:\soft\Python36\lib\site-packages\urllib3\connectionpool.py", line 639, in urlopen
  23.     _stacktrace=sys.exc_info()[2])
  24.   File "E:\soft\Python36\lib\site-packages\urllib3\util\retry.py", line 357, in increment
  25.     raise six.reraise(type(error), error, _stacktrace)
  26.   File "E:\soft\Python36\lib\site-packages\urllib3\packages\six.py", line 685, in reraise
  27.     raise value.with_traceback(tb)
  28.   File "E:\soft\Python36\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
  29.     chunked=chunked)
  30.   File "E:\soft\Python36\lib\site-packages\urllib3\connectionpool.py", line 387, in _make_request
  31.     six.raise_from(e, None)
  32.   File "<string>", line 2, in raise_from
  33.   File "E:\soft\Python36\lib\site-packages\urllib3\connectionpool.py", line 383, in _make_request
  34.     httplib_response = conn.getresponse()
  35.   File "E:\soft\Python36\lib\http\client.py", line 1331, in getresponse
  36.     response.begin()
  37.   File "E:\soft\Python36\lib\http\client.py", line 297, in begin
  38.     version, status, reason = self._read_status()
  39.   File "E:\soft\Python36\lib\http\client.py", line 258, in _read_status
  40.     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  41.   File "E:\soft\Python36\lib\socket.py", line 586, in readinto
  42.     return self._sock.recv_into(b)
  43. urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(10054, '远程主机强迫关闭了一个现有的连接。', None, 10054, None))

  44. During handling of the above exception, another exception occurred:

  45. Traceback (most recent call last):
  46.   File "<pyshell#2>", line 1, in <module>
  47.     res = requests.get("http://jobs.51job.com/all/co456898.html")
  48.   File "E:\soft\Python36\lib\site-packages\requests\api.py", line 72, in get
  49.     return request('get', url, params=params, **kwargs)
  50.   File "E:\soft\Python36\lib\site-packages\requests\api.py", line 58, in request
  51.     return session.request(method=method, url=url, **kwargs)
  52.   File "E:\soft\Python36\lib\site-packages\requests\sessions.py", line 508, in request
  53.     resp = self.send(prep, **send_kwargs)
  54.   File "E:\soft\Python36\lib\site-packages\requests\sessions.py", line 618, in send
  55.     r = adapter.send(request, **kwargs)
  56.   File "E:\soft\Python36\lib\site-packages\requests\adapters.py", line 490, in send
  57.     raise ConnectionError(err, request=request)
  58. requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, '远程主机强迫关闭了一个现有的连接。', None, 10054, None))
复制代码


123.jpg
最佳答案
2017-11-4 17:40:59
  1. headers = {
  2.         'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0'
  3.     }
复制代码

response = requests.get(url,headers=headers)

你爬取时默认的爬虫头是python-requests/x.xx.x
诸如此类,服务器发现携带这类headers的数据包的时候,会直接拒绝访问

解决方法就是,直接伪造User-Agent信息

'User-Agent':'BaiduSipder'便是百度爬虫的header信息


小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2017-11-4 16:30:13 | 显示全部楼层
是的,这样只能伪装一下你的爬虫了,使用代理IP或者伪装成浏览器吧。
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2017-11-4 17:13:59 | 显示全部楼层
反爬越来越普遍了
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2017-11-4 17:40:59 | 显示全部楼层    本楼为最佳答案   
  1. headers = {
  2.         'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0'
  3.     }
复制代码

response = requests.get(url,headers=headers)

你爬取时默认的爬虫头是python-requests/x.xx.x
诸如此类,服务器发现携带这类headers的数据包的时候,会直接拒绝访问

解决方法就是,直接伪造User-Agent信息

'User-Agent':'BaiduSipder'便是百度爬虫的header信息


小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2017-11-9 15:13:56 | 显示全部楼层
Teagle 发表于 2017-11-4 17:40
response = requests.get(url,headers=headers)

你爬取时默认的爬虫头是python-requests/x.xx.x

嗯,谢谢
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2025-6-30 13:00

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表