|
20鱼币
本帖最后由 fc5igm 于 2023-2-6 17:27 编辑
目标是爬取下图中的网页信息
网址链接:https://www.csindex.com.cn/#/indices/family/list?index_series=5
为此,写了如下代码
- import urllib
- import json
- headers={'Accept': r'application/json, text/plain, */*', 'User-Agent': r'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78'}
- url=r'https://www.csindex.com.cn/csindex-home/index-list/query-index-item'
- data='{"sorter":{"sortField":"null","sortOrder":null},"pager":{"pageNum":1,"pageSize":10},"indexFilter":{"ifCustomized":null,"ifTracked":null,"ifWeightCapped":null,"indexCompliance":null,"hotSpot":null,"indexClassify":["16","18"],"currency":null,"region":["china_mainland"],"indexSeries":["6","1","2","3"],"undefined":null}}'
- data = urllib.parse.urlencode(json.loads(data)).encode('utf-8')
- opener = urllib.request.build_opener()
- request = urllib.request.Request(url, headers=headers,data=data)
- response = opener.open(request,timeout=10).read().decode('utf-8', 'ignore')
复制代码
然后获得的反馈是
'{"code":"500","msg":"服务器异常,请联系管理员","data":null,"success":false}'
尝试在headers中加入cookie也毫无变化。请问这种情况下应该怎么办?
本帖最后由 isdkz 于 2023-2-6 17:51 编辑
我试了一下,这个接口并没有反爬机制,所以 UA、Referer、Cookie 都不需要
data你直接原始数据用字符串编码就好,然后 headers 通过 Content-Type 告诉他 data 里面的是什么类型
- import urllib.request
- import json
- headers={'Content-Type': 'application/json;charset=UTF-8'}
- url=r'https://www.csindex.com.cn/csindex-home/index-list/query-index-item'
- data='{"sorter":{"sortField":"null","sortOrder":null},"pager":{"pageNum":1,"pageSize":10},"indexFilter":{"ifCustomized":null,"ifTracked":null,"ifWeightCapped":null,"indexCompliance":null,"hotSpot":null,"indexClassify":null,"currency":null,"region":null,"indexSeries":["5"],"undefined":null}}'
- data = data.encode('utf-8')
- opener = urllib.request.build_opener()
- request = urllib.request.Request(url, headers=headers,data=data)
- response = opener.open(request,timeout=10).read().decode('utf-8', 'ignore')
- print(response)
复制代码
|
最佳答案
查看完整内容
我试了一下,这个接口并没有反爬机制,所以 UA、Referer、Cookie 都不需要
data你直接原始数据用字符串编码就好,然后 headers 通过 Content-Type 告诉他 data 里面的是什么类型
|