MSDN I TELL YOU 爬虫求助,Python交流,编程语言专区,鱼C论坛

lengyue869 发表于 2020-11-1 20:32:04

MSDN I TELL YOU 爬虫求助

[求助]用爬虫通过以下方式获取资源:

打开https://msdn.itellyou.cn/,在搜索框输入ASP.NET,下载搜索到的资源

http://ys-d.ys168.com/613967824/j7137544465L4LjuSNHT/111.png

suchocolate 发表于 2020-11-2 09:46:58

有个x-csrf-token可能是js和server发的参数生成的，暂不知道生成方法。
省事直接用浏览器生成的用，可以取到数据。

import requests

def main():
url = 'https://msdn.itellyou.cn/Index/Search'
headers = {'host': 'msdn.itellyou.cn',
            'user-agent': 'mozilla',
            'x-requested-with': 'XMLHttpRequest',
            'cookie': '.AspNetCore.Antiforgery.kC_Kc8he0KM=CfDJ8Jw19B-OaM1KveQHPjyyKOMADAmMg2q5toW_LJWlqEXnU0jD9YC6wUstfDumTKhBH0rNObkFecQLizZmVdRAQjmo8v15j9AC_r7dMC4mLpbYgE4iY87M2pp2cmbzF0fxx84lnQLnTwFpHepRbzYobPA; UM_distinctid=175868b07b9a1-0004b0cc9965b78-116b634a-144000-175868b07ba123; CNZZDATA1605814=cnzz_eid%3D1160383727-1604275704-https%253A%252F%252Ffishc.com.cn%252F%26ntime%3D1604275704; _ga=GA1.2.1891466530.1604280060; _gid=GA1.2.888516437.1604280060; Hm_lvt_8688ca4bc18cbc647c9c68fdaef6bc24=1604280060,1604280659; Hm_lpvt_8688ca4bc18cbc647c9c68fdaef6bc24=1604280659; _gat=1',
            'x-csrf-token': 'CfDJ8Jw19B-OaM1KveQHPjyyKONjKrajxjskyf-i-AYUela7tX0R6jfXuGhFUWXu0Ddf1x4jxrcRj1b5Lw9pbFUoikM8NesPhSlHr60O8YzEO5tysfLDWZ-WgBbTAuab6Hb3gEU5boFJgRmEJurpu5hc_2A'}
data = {'keyword': 'ASP.NET', 'filter': 'true'}
r = requests.post(url, headers=headers, data=data)
result = r.json()['result']['list']['product']['url']
print(result)

if __name__ == '__main__':
main()

YunGuo 发表于 2020-11-5 16:00:16

首先在首页获取到token和cookie，再通过搜索接口获取到结果。
import requests
import re

url = ['https://msdn.itellyou.cn/', 'https://msdn.itellyou.cn/Index/Search']

def get_index():
# 获取token
headers = {
   'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
                  '(KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36'
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
   token = re.findall('data-token=(.*?)>', response.content.decode())
   cookie = response.headers.get('set-cookie')
   return token, cookie
else:
   return None

def get_search(token, cookie, keyword):
# 获取结果
headers = {
   'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
                  '(KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36',
   'x-csrf-token': token,
   'cookie': cookie
}

form_data = {
   'keyword': keyword,
   'filter': 'true'
}
response = requests.post(url, headers=headers, data=form_data).json()
data = response['result']['list']['product']
name = data['name']
ed2k = data['url']
print('文件名:', name)
print(ed2k)

if __name__ == '__main__':
word = input('输入正确的关键词：')
token, cookie = get_index()
get_search(token, cookie, word)

页: [1]

鱼C论坛's Archiver

MSDN I TELL YOU 爬虫求助