鱼C论坛

 找回密码
 立即注册
查看: 1496|回复: 1

[已解决]关于scrapy爬虫报错301重定向问题

[复制链接]
发表于 2018-11-6 14:39:04 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
最近写了一个scrapy爬虫爬取淘宝数据,但是没次都爬不全,最后发现爬取部分网页都会报错301,已经伪装了user agent,加上Referer,但是还是有部分网页无法爬取,请问大家遇到这种情况是怎么解决的呢,以下是部分错误代码:
2018-11-06 14:27:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://item.taobao.com/item.htm?id=548573777437> (referer: https://s.taobao.com/search?q=%E6%98%BE%E7%A4%BA%E5%99%A8)
https://item.taobao.com/item.htm?id=552267374137
2018-11-06 14:27:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://item.taobao.com/item.htm?id=571301219510> (referer: https://s.taobao.com/search?q=%E6%98%BE%E7%A4%BA%E5%99%A8)
https://item.taobao.com/item.htm?id=558688071694
2018-11-06 14:27:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://item.taobao.com/item.htm?id=570309089342> (referer: https://s.taobao.com/search?q=%E6%98%BE%E7%A4%BA%E5%99%A8)
https://item.taobao.com/item.htm?id=521785652630
2018-11-06 14:27:39 [scrapy.core.engine] DEBUG: Crawled (301) <GET https://item.taobao.com/item.htm?id=571296540068> (referer: https://s.taobao.com/search?q=%E6%98%BE%E7%A4%BA%E5%99%A8)
2018-11-06 14:27:39 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <301 https://item.taobao.com/item.htm?id=553289240364>: HTTP status code is not handled or not allowed
2018-11-06 14:27:39 [scrapy.core.engine] DEBUG: Crawled (301) <GET https://item.taobao.com/item.htm?id=537598464080> (referer: https://s.taobao.com/search?q=%E6%98%BE%E7%A4%BA%E5%99%A8)
2018-11-06 14:27:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://item.taobao.com/item.htm?id=546867778273> (referer: https://s.taobao.com/search?q=%E6%98%BE%E7%A4%BA%E5%99%A8)
https://item.taobao.com/item.htm?id=561112325868
https://item.taobao.com/item.htm?id=566378917121
2018-11-06 14:27:39 [scrapy.core.engine] DEBUG: Crawled (301) <GET https://item.taobao.com/item.htm?id=45033959543> (referer: https://s.taobao.com/search?q=%E6%98%BE%E7%A4%BA%E5%99%A8)
2018-11-06 14:27:39 [scrapy.core.engine] DEBUG: Crawled (301) <GET https://item.taobao.com/item.htm?id=570346798714> (referer: https://s.taobao.com/search?q=%E6%98%BE%E7%A4%BA%E5%99%A8)
2018-11-06 14:27:39 [scrapy.core.engine] DEBUG: Crawled (301) <GET https://item.taobao.com/item.htm?id=578947695290> (referer: https://s.taobao.com/search?q=%E6%98%BE%E7%A4%BA%E5%99%A8)
2018-11-06 14:27:39 [scrapy.core.engine] DEBUG: Crawled (301) <GET https://item.taobao.com/item.htm?id=13298514524> (referer: https://s.taobao.com/search?q=%E6%98%BE%E7%A4%BA%E5%99%A8)
https://item.taobao.com/item.htm?id=523249394550
https://item.taobao.com/item.htm?id=548573777437
最佳答案
2018-11-6 19:26:35
请求头的参数不够。  
现在淘宝需要登录才能看到数据。
需要你登录后的曲奇(曲奇英文自行翻译)

还有就是需要把速度降下来。  
速度过快也会导致淘宝对你重定向到验证或者登录页面
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2018-11-6 19:26:35 | 显示全部楼层    本楼为最佳答案   
请求头的参数不够。  
现在淘宝需要登录才能看到数据。
需要你登录后的曲奇(曲奇英文自行翻译)

还有就是需要把速度降下来。  
速度过快也会导致淘宝对你重定向到验证或者登录页面
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2024-5-18 17:14

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表