【求教】使用scrapy爬虫爬数据被重定向的问题求教!!
看完小甲鱼的scrapy的视频兴致冲冲的去爬数据{:5_105:} ,却遇到了网页重定向的问题{:5_107:} ,特来求教。我爬的网站是https://www.tiebaobei.com/ue/1
爬了几页后返回302的重定向错误,重定向的网站变成https://m.tiebaobei.com/ue/1
我一看只是把www.变成了m.,
所以我在class CehomeDownloaderMiddleware(object):里做了如下更改
def process_request(self, request, spider):
# Called for each request that goes through the downloader
# middleware.
# Must either:
# - return None: continue processing this request
# - or return a Response object
# - or return a Request object
# - or raise IgnoreRequest: process_exception() methods of
# installed downloader middleware will be called
#为了规避重定向,尝试一下在这里调用request的_set_url
if "//m.tiebaobei" in request.url:
time.sleep(120)
logger.debug(f"request.url是:{request.url}程度走到这里了--------------response.url是:{response.url}------遇到重定向,休息2分钟!---------------------------------")
request._set_url((request.url.replace("//m.","//www.") if ("//m.tiebaobei" in request.url) else request.url))
return None
但还是不行,在网上百度了很多方法,也都不行,只能来这里求教了{:5_109:} 。 帮你顶一下,我也不懂
页:
[1]