|
20鱼币
本帖最后由 fc5igm 于 2023-2-6 17:27 编辑
目标是爬取下图中的网页信息
网址链接:https://www.csindex.com.cn/#/indices/family/list?index_series=5
为此,写了如下代码import urllib
import json
headers={'Accept': r'application/json, text/plain, */*', 'User-Agent': r'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78'}
url=r'https://www.csindex.com.cn/csindex-home/index-list/query-index-item'
data='{"sorter":{"sortField":"null","sortOrder":null},"pager":{"pageNum":1,"pageSize":10},"indexFilter":{"ifCustomized":null,"ifTracked":null,"ifWeightCapped":null,"indexCompliance":null,"hotSpot":null,"indexClassify":["16","18"],"currency":null,"region":["china_mainland"],"indexSeries":["6","1","2","3"],"undefined":null}}'
data = urllib.parse.urlencode(json.loads(data)).encode('utf-8')
opener = urllib.request.build_opener()
request = urllib.request.Request(url, headers=headers,data=data)
response = opener.open(request,timeout=10).read().decode('utf-8', 'ignore')
然后获得的反馈是
'{"code":"500","msg":"服务器异常,请联系管理员","data":null,"success":false}'
尝试在headers中加入cookie也毫无变化。请问这种情况下应该怎么办?
本帖最后由 isdkz 于 2023-2-6 17:51 编辑
我试了一下,这个接口并没有反爬机制,所以 UA、Referer、Cookie 都不需要
data你直接原始数据用字符串编码就好,然后 headers 通过 Content-Type 告诉他 data 里面的是什么类型
import urllib.request
import json
headers={'Content-Type': 'application/json;charset=UTF-8'}
url=r'https://www.csindex.com.cn/csindex-home/index-list/query-index-item'
data='{"sorter":{"sortField":"null","sortOrder":null},"pager":{"pageNum":1,"pageSize":10},"indexFilter":{"ifCustomized":null,"ifTracked":null,"ifWeightCapped":null,"indexCompliance":null,"hotSpot":null,"indexClassify":null,"currency":null,"region":null,"indexSeries":["5"],"undefined":null}}'
data = data.encode('utf-8')
opener = urllib.request.build_opener()
request = urllib.request.Request(url, headers=headers,data=data)
response = opener.open(request,timeout=10).read().decode('utf-8', 'ignore')
print(response)
|
最佳答案
查看完整内容
我试了一下,这个接口并没有反爬机制,所以 UA、Referer、Cookie 都不需要
data你直接原始数据用字符串编码就好,然后 headers 通过 Content-Type 告诉他 data 里面的是什么类型
|