爬虫问题求助
本帖最后由 siryang 于 2023-5-26 15:30 编辑问题1:在学习爬取网易云评论时,发现按照小甲鱼的操作复制 params 和 encSecKey 两个参数时,使用我自己复制的浏览器参数要报错,而使用小甲鱼的这两个参数不会报错,是因为什么呢?
小甲鱼的学习帖子:https://fishc.com.cn/thread-100435-1-1.html
params = "EuIF/+GM1OWmp2iaIwbVdYDaqODiubPSBToe5EdNp6LHTLf+aID/dWGU6bHWXS0jD9pPa/oY67TOwiicLygJ+BhMkOX/J1tZMhq45dcUIr6fLuoHOECYrOU6ySwH4CjxxdbW3lpVmksGEdlxbZevVPkTPkwvjNLDZHK238OuNCy0Csma04SXfoVM3iLhaFBT"
encSecKey = "db26c32e0cd08a11930639deadefda2783c81034be6445ca8f4fbedd346e1f9567375083aeb1a85e6ad6d9ae4532a49752c2169db8bcc04d38a79f9bed7facea42ee23f1b33538c34f82741318d9b4b846663b53b0b808dd0499dccfbc6c61fbf180c6fb24b1c2dd3c2c450ce09917d74be9424dab836fd2e671988ffbc6ae1b"
# encSecKey = " 6d407605344a08d156d8dd7251c756b8c98a69ef8dd670ffcdc5d7db0019cbfa283a07707363c9a54a6d749b2fceda32b4e450d2988741f4c48df304263fa213de85ec14425bf2c5cd3c64c3fd7bf90d9bfed66438d02c8d60078a09ecc2273be30aa6fbe4082c3dd3f18cb23efbfeab6f6209c173147b6f8f768be296c6a5b5"
# params = "aYI5Lgk6cIIEOKVLrbjf3jakQsRFxSFygJEu7CJ5pQ + YHT5Jgt3KLeFlUAKCp2zZDrfz + a1eSoRNJa + RWGsRaT + O1k3wGP6IgB8qd3oPjas2rpqngCgxh9ymYk2z0Qn4gU8pd2cJ8uiEBsTT3S0d5tLOEUAS + qbjRD9gI / H3XGeRNH8HBTUBz0 / P / cB4dvFcKS76lynAiuzZiHdrgmcWDK7MEa0r + uDdpDDtjutZpJBPXHvhN20L6 + KbkskBqEtrWYDV9YiZEVzC3tNZ / LQQNveA0pq + X + tl70vac5IUqfI ="
问题2:网易云的网站评论请求标头不在包含歌曲ID,如:请求 URL: https://music.163.com/weapi/comment/resource/comments/get?csrf_token= 使用该URL会报错{"msg":"参数错误","code":400} 请问下这种又该如何处理呢? 方便给出完整代码吗? isdkz 发表于 2023-5-26 16:09
方便给出完整代码吗?
# @Time:2023/5/2613:27
# @Author:YL
# @File:pachong_wangyiyun.py
# @Software:PyCharm
import requests
import bs4
import re
import openpyxl
import json
def open_url(url):
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.42',
'referer': 'https: // music.163.com /'}
# res = requests.get(url, headers=headers, proxies=proxies)
params = "EuIF/+GM1OWmp2iaIwbVdYDaqODiubPSBToe5EdNp6LHTLf+aID/dWGU6bHWXS0jD9pPa/oY67TOwiicLygJ+BhMkOX/J1tZMhq45dcUIr6fLuoHOECYrOU6ySwH4CjxxdbW3lpVmksGEdlxbZevVPkTPkwvjNLDZHK238OuNCy0Csma04SXfoVM3iLhaFBT"
encSecKey = "db26c32e0cd08a11930639deadefda2783c81034be6445ca8f4fbedd346e1f9567375083aeb1a85e6ad6d9ae4532a49752c2169db8bcc04d38a79f9bed7facea42ee23f1b33538c34f82741318d9b4b846663b53b0b808dd0499dccfbc6c61fbf180c6fb24b1c2dd3c2c450ce09917d74be9424dab836fd2e671988ffbc6ae1b"
# encSecKey = " 6d407605344a08d156d8dd7251c756b8c98a69ef8dd670ffcdc5d7db0019cbfa283a07707363c9a54a6d749b2fceda32b4e450d2988741f4c48df304263fa213de85ec14425bf2c5cd3c64c3fd7bf90d9bfed66438d02c8d60078a09ecc2273be30aa6fbe4082c3dd3f18cb23efbfeab6f6209c173147b6f8f768be296c6a5b5"
# params = "aYI5Lgk6cIIEOKVLrbjf3jakQsRFxSFygJEu7CJ5pQ + YHT5Jgt3KLeFlUAKCp2zZDrfz + a1eSoRNJa + RWGsRaT + O1k3wGP6IgB8qd3oPjas2rpqngCgxh9ymYk2z0Qn4gU8pd2cJ8uiEBsTT3S0d5tLOEUAS + qbjRD9gI / H3XGeRNH8HBTUBz0 / P / cB4dvFcKS76lynAiuzZiHdrgmcWDK7MEa0r + uDdpDDtjutZpJBPXHvhN20L6 + KbkskBqEtrWYDV9YiZEVzC3tNZ / LQQNveA0pq + X + tl70vac5IUqfI ="
data = {
"params": params,
"encSecKey": encSecKey}
name_id = url.split('=')
target_url = "http://music.163.com/weapi/v1/resource/comments/R_SO_4_{}?csrf_token=".format(name_id)
res = requests.post(target_url, headers=headers,data=data )
return res
def get_hot_comments(res):
comments_json = json.loads(res.text)
hot_comments = comments_json['hotComments']
with open('hot_comments111.txt', 'w', encoding='utf-8') as file:
for each in hot_comments:
file.write(each['user']['nickname'] + ':\n\n')
file.write(each['content'] + '\n')
file.write("---------------------------------------\n")
def find_data(res):
data = []
soup = bs4.BeautifulSoup(res.text, "html.parser")
content = soup.find_all("div", class_="cnt f-brk")
for each in content:
data.append(each.text)
return data
def main():
url = input("请输入链接地址:")
res = open_url(url)
get_hot_comments(res)
if __name__ == '__main__':
main()
siryang 发表于 2023-5-26 16:23
# @Time:2023/5/2613:27
# @Author:YL
# @File:pachong_wangyiyun.py
这个代码正常执行呀
isdkz 发表于 2023-5-26 17:06
这个代码正常执行呀
这代码是没问题啊,这是用的小甲鱼的教程代码,我的意思是按照小甲鱼的步骤,使用我自己params和encSecKey就会报错,就是我代码中注释的这两个参数。
其次,网易云的标头Url已经没有显示歌曲的ID数字,如直接使用现在的请求url就会显示{"msg":"参数错误","code":400} siryang 发表于 2023-5-26 17:18
这代码是没问题啊,这是用的小甲鱼的教程代码,我的意思是按照小甲鱼的步骤,使用我自己params和encSec ...
你是怎么复制的?怎么复制出来这么多空格?
我复制的没有问题 isdkz 发表于 2023-5-26 17:37
你是怎么复制的?怎么复制出来这么多空格?
我复制的没有问题
还真是空格问题,手动删了就可以了。不知道为啥我从edge复制的就会有空格
大佬第二问题咋个解决哇 siryang 发表于 2023-5-26 18:13
还真是空格问题,手动删了就可以了。不知道为啥我从edge复制的就会有空格
我在edge复制的没有问题
至于第二个问题,之前的接口是老接口了,现在新的接口得看他那个参数是怎么构造的,也就是需要 js 逆向
你可以参考这个:https://blog.csdn.net/weixin_51152456/article/details/121358796
页:
[1]