python 怎么检测的403啊？

takeshi · 发表于 2017-10-7 22:20:46

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

爬取403的网页，并且把是403的网页保存下来！
问题就在于怎么检测403？？
求助啊！！！

ba21 · 发表于 2017-10-7 23:14:01

异常处理，判断异常代码
如：
写法1：
import urllib.request
from urllib.error import *

req = urllib.request.Request("http://www.fishc.com/ooxx.html")
try:
urllib.request.urlopen(req)
except HTTPError as e:
print(e.code)
print(e.reason)
print(e.read())
except URLError as e:
print(e.reason)

写法2:（推荐）
try:
response = urlopen(req)
except URLError as e:
if hasattr(e, 'reason'):
      print(e.reason)
elif hasattr(e, 'code'):
      print(e.code)
else:
      pass

purplenight · 发表于 2017-10-8 11:13:15

好像Python 3 没有实现403,404... 处理。

可以试试http.client

文档链接：https://docs.python.org/3/library/http.client.html#examples

>>> import http.client
>>> conn = http.client.HTTPSConnection("www.python.org")
>>> conn.request("GET", "/")
>>> r1 = conn.getresponse()
>>> print(r1.status, r1.reason)
200 OK
>>> data1 = r1.read() # This will return entire content.
>>> # The following example demonstrates reading data in chunks.
>>> conn.request("GET", "/")
>>> r1 = conn.getresponse()
>>> while not r1.closed:
... print(r1.read(200)) # 200 bytes
b'<!doctype html>\n<!--[if"...
...
>>> # Example of an invalid request
>>> conn.request("GET", "/parrot.spam")
>>> r2 = conn.getresponse()
>>> print(r2.status, r2.reason)
404 Not Found
>>> data2 = r2.read()
>>> conn.close()

复制代码

账号		自动登录	找回密码
密码			立即注册

python 怎么检测的403啊？

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块