|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 学程序的古代人 于 2019-7-6 15:43 编辑
正常使用scrapy访问丁香园是没有问题的http://www.dxy.cn/ 但是自己想通过中间件修改一下请求头,于是自定义了一个类
- class RandomUA(object):
- def __init__(self):
- self.user_agents = [
- "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36",
- "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
- "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.133 Safari/534.16"
- ]
- def process_request(self, request, spider):
- request.headers["User-Agent"] = random.choice(self.user_agents)
- # return None
- def process_response(self, response, spider):
- return response
复制代码
接着在settings.py文件中添加设置:
DOWNLOADER_MIDDLEWARES = {
'heart.middlewares.RandomUA': 300,
}
但是运行后被网站拒绝:
- 2019-07-06 15:33:02 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http://www.dxy.cn/robots.txt>: process_response() got an unexpected keyword argument 'request'
- Traceback (most recent call last):
- File "/home/lip/opt/anaconda3/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
- result = g.send(result)
- File "/home/lip/opt/anaconda3/lib/python3.7/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
- defer.returnValue((yield download_func(request=request,spider=spider)))
- File "/home/lip/opt/anaconda3/lib/python3.7/site-packages/twisted/internet/defer.py", line 1362, in returnValue
- raise _DefGen_Return(val)
- twisted.internet.defer._DefGen_Return: <200 http://www.dxy.cn/robots.txt>
复制代码
我想这里应该是我自己定义出错,但是看到网上他们都是这么定义的,还有,不加中间件的时候使用默认爬虫也是不会出错,所以不知道问题出在哪里。求大神告知,感谢! |
|