|
20鱼币
本帖最后由 hjx123hjx 于 2019-5-28 13:36 编辑
在写scrapy爬虫时遇到一个问题,爬的是电脑端美团外卖商铺的用户评论。用requests写的会返回200,成功爬取到数据,scrapy写的就返回400,挂了。不知道什么原因,麻烦大佬们看看。
先贴上requests代码
- import requests
- url = 'https://waimai.meituan.com/ajax/comment' # 这是美团外卖评论的Ajax的url
- headers = {
- 'Accept':'application/json, text/javascript, */*; q=0.01',
- 'Accept-Encoding':'gzip, deflate, br',
- 'Accept-Language':'zh-CN,zh;q=0.9',
- 'Connection':'keep-alive',
- 'Content-Length':'241',
- 'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
- 'Cookie':'uuid=8e9362664f0948d8a5d4.1557983309.1.0.0; _lxsdk_cuid=16abf0aad07c8-0431f185189d65-3c604504-1fa400-16abf0aad08c8; w_utmz="utm_campaign=(direct)&utm_source=(direct)&utm_medium=(none)&utm_content=(none)&utm_term=(none)"; w_uuid=DqQpKRQajmtt8yKTA5uNYZLKWcphwC9qZKiHKry1kb80O1WjoLKtm4nPDz4skpD0; _ga=GA1.3.41017888.1557983326; waddrname="%E5%8D%8E%E4%BE%A8%E5%A4%A7%E5%AD%A6%28%E6%B3%89%E5%B7%9E%E6%A0%A1%E5%8C%BA%29"; w_geoid=wskmguqgtp3w; w_cid=350503; w_cpy=fengzequ; w_cpy_cn="%E4%B8%B0%E6%B3%BD%E5%8C%BA"; w_ah="24.940978847444057,118.65084372460842,%E5%8D%8E%E4%BE%A8%E5%A4%A7%E5%AD%A6%28%E6%B3%89%E5%B7%9E%E6%A0%A1%E5%8C%BA%29"; __utma=211559370.698078701.1558067673.1558067673.1558067673.1; __utmz=211559370.1558067673.1.1.utmcsr=baidu|utmccn=baidu|utmcmd=organic|utmcct=zt_search; Hm_lvt_f66b37722f586a240d4621318a5a6ebe=1558067673; __mta=252484278.1557983327243.1558067738912.1558409690003.5; _gid=GA1.3.520237432.1558681235; _lx_utm=utm_source%3Dbaidu%26utm_campaign%3Dbaidu%26utm_medium%3Dorganic%26utm_content%3Dzt_search; w_visitid=fe58a7bf-408e-442b-adae-318f085100d9; _lxsdk_s=16aece83463-726-2ca-de5%7C%7C33; JSESSIONID=otcqlhtf82jigfigwn86mhvb; _gat=1',
- 'Host':'waimai.meituan.com',
- 'Origin':'https://waimai.meituan.com',
- 'Referer':'https://waimai.meituan.com/comment/144748066017325512',
- 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
- 'X-Requested-With':'XMLHttpRequest'
- }
- # 要提交的表单
- data = {
- 'wmpoiIdStr': '144748066017325512',
- 'offset': '1',
- 'has_content': '1',
- 'score_grade': '1',
- 'uuid': 'DqQpKRQajmtt8yKTA5uNYZLKWcphwC9qZKiHKry1kb80O1WjoLKtm4nPDz4skpD0',
- 'platform': '1',
- 'partner': '4',
- 'originUrl': 'https%3A%2F%2Fwaimai.meituan.com%2Fcomment%2F144748066017325512'
- }
- cookie = 'uuid=8e9362664f0948d8a5d4.1557983309.1.0.0; _lxsdk_cuid=16abf0aad07c8-0431f185189d65-3c604504-1fa400-16abf0aad08c8; w_utmz="utm_campaign=(direct)&utm_source=(direct)&utm_medium=(none)&utm_content=(none)&utm_term=(none)"; w_uuid=DqQpKRQajmtt8yKTA5uNYZLKWcphwC9qZKiHKry1kb80O1WjoLKtm4nPDz4skpD0; _ga=GA1.3.41017888.1557983326; waddrname="%E5%8D%8E%E4%BE%A8%E5%A4%A7%E5%AD%A6%28%E6%B3%89%E5%B7%9E%E6%A0%A1%E5%8C%BA%29"; w_geoid=wskmguqgtp3w; w_cid=350503; w_cpy=fengzequ; w_cpy_cn="%E4%B8%B0%E6%B3%BD%E5%8C%BA"; w_ah="24.940978847444057,118.65084372460842,%E5%8D%8E%E4%BE%A8%E5%A4%A7%E5%AD%A6%28%E6%B3%89%E5%B7%9E%E6%A0%A1%E5%8C%BA%29"; __utma=211559370.698078701.1558067673.1558067673.1558067673.1; __utmz=211559370.1558067673.1.1.utmcsr=baidu|utmccn=baidu|utmcmd=organic|utmcct=zt_search; Hm_lvt_f66b37722f586a240d4621318a5a6ebe=1558067673; __mta=252484278.1557983327243.1558067738912.1558409690003.5; _gid=GA1.3.520237432.1558681235; _lx_utm=utm_source%3Dbaidu%26utm_campaign%3Dbaidu%26utm_medium%3Dorganic%26utm_content%3Dzt_search; w_visitid=fe58a7bf-408e-442b-adae-318f085100d9; _lxsdk_s=16aece83463-726-2ca-de5%7C%7C32; JSESSIONID=e7r83ogmqq8d1aaqcswrjaary; _gat=1'
- cookies = {each.split('=')[0]: each.split('=')[1] for each in cookie.split('; ')}
- response = requests.post(url=url, headers=headers, data=data, cookies=cookies)
- print(response.status_code)
复制代码
在scrapy写的部分如下
- class MtSpider(scrapy.Spider):
- name = 'mt'
- allowed_domains = ['waimai.meituan.com']
- start_url = 'https://waimai.meituan.com/ajax/comment'
- def start_requests(self):
- # yield scrapy.Request(url=self.start_url, method='POST', body=json.dumps(data), headers=headers, cookies=cookies, callback=self.parse) 这个是在网上找的办法,但是也不行
- yield scrapy.FormRequest(url=self.start_url, formdata=data, headers=headers, cookies=cookies, callback=self.parse)
复制代码
这里的cookies,headers,data是requests里复制过来的,都一样。但访问后就报400,无法获得数据。希望大佬帮忙看看这是怎么回事?谢谢!
提醒:美团外卖会封IP和浏览器,被封后换个浏览器或IP就可以了,一般封一两天吧。代码还是可以爬的,不是很清楚这个封的原理。 |
|