鱼C论坛

 找回密码
 立即注册
查看: 2461|回复: 6

[已解决]爬取网易云音乐的精彩评论

[复制链接]
发表于 2020-2-12 10:45:19 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
本帖最后由 xiangzhihengkan 于 2020-2-12 10:48 编辑

代码可以正常运行,但是爬取到内容保存的文件内只有{"msg":"Cheating","code":-460,"message":"Cheating"}
我把data里面也填进去cookie了,还是不行

请大佬给看一下是代码有问题吗?

  1. import requests

  2. def get_comments(url):
  3.     name_id = url.split('=')[1]
  4.     headers = {
  5.         'user_agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36',
  6.         'referer':'https://music.163.com/song?id={}'.format(name_id)
  7.         }
  8.     params = 'bAlZeFbErjR3OaTk2dItfjTsyEHewPeLY2ZbyUowgmWGBu7+LPquSyuAs+5NHJrqpLQQ0WGhyAIqSqXd0ZqT6HZG1uIo8+AxxEeLqSKt1NiGaTB7YAjR4FEZXaQqwpdjPIjivqi/ECSBaONhGdhKcnUkEvAXEox7oQo4bMICXSHeOsyW8lYfIQzRkwK5PQ9g'
  9.     encSecKey = '2f6816b0e71f96e8c72fafaaa3634c23cbc43ebf5f5088b5f499e877893e98898059fed97a0fe125c9b00a44ea2996314745e91d75d4cf95f2fa26145492c401b8d02a6eb3451bb49e7df3a57d52f5ad78ae3df477b332f0e5f91df68ccc20d66fee26fd8112f52414cf9139ff74100da59e10f45009063142a4fc1438a5eeb9'
  10.     cookie = 'JSESSIONID-WYYY=OqYRslUe3vwNR5pOqud591rgdtBxstHb%5CznkpNTtMGVnjIQjz2zaEPMNgyPRW13ffDw%2Fc%2B4e5onRY7OKbGJlUlffpJs9%2FYkSF3BgUoluDHW5YaFguft3VOjhk2eSdiD9mUHoc7Zk7WFN2CC55K7%2FUh%5CwAtKPWbREEIZwHrKMfiUn9Pfk%3A1581473881430; _iuqxldmzr_=32; _ntes_nnid=5ccd50e65676e6cebfc754e7d304ba05,1581472081493; _ntes_nuid=5ccd50e65676e6cebfc754e7d304ba05; WM_NI=LXb%2BxNhlEK3VE5bmo46ZJqUWC9V1A%2FNUhPr%2BQ5xuF9FfKndK8QMmJOyL6KefwXpPC%2BDkfeZmiIKRotNmqp0ujIu3LIwchhyQ7LuHtWIhhFy8gUZSuDq5RVxfu%2BFCAU3KZU4%3D; WM_NIKE=9ca17ae2e6ffcda170e2e6ee99dc7e869f8fa6db64b6b08aa7d45e939b8ebaaa5a93acbfa7ed54fcb484d7ce2af0fea7c3b92a9088aeccfb46acb48bacb25e87ebae8eee40f6868692d85d81e79ad2c4258e8ba8b1f12193ebb896c53386ba858fe741fc8f838ec945bb94faace86eb588c0b1c22188b5aa86d967ab9785b3ee808eb3bf91b37faa97bbd4cf43b5bda889fc5bf594bba9f75d8bb8a8a7fc34a58b96b4f850b3eb9eb7fb6eb7f58aa4c45ff19596b6b737e2a3; WM_TID=XCP%2B2fh7KoFBBUEVEBYqAE4%2FBjHolLqe'
  11.     data = {
  12.         "params":params,
  13.         "encSecKey":encSecKey,
  14.         "cookie":cookie
  15.         }
  16.     target_url = "https://music.163.com/weapi/v1/resource/comments/R_SO_4_{}?csrf_token=".format(name_id)
  17.     res = requests.post(target_url,headers=headers,data=data)

  18.     return res
  19.    
  20. def main():
  21.     url = input("请输入链接地址:")
  22.     #res = get_url(url)
  23.     res = get_comments(url)

  24.     with open("res.txt","w",encoding="utf-8") as file:
  25.         file.write(res.text)
  26.    

  27. if __name__ == "__main__":
  28.     main()
复制代码
输入的链接地址是:https://music.163.com/#/song?id=1404885266

最佳答案
2020-2-13 09:31:08
我这里运行没问题,但是再输入网址时,注意去掉#_好像是什么反爬虫机制,我的帖子里,有一个就是关于这个的,你可以去看看,你的代码在我这里运行是可以获取到一个完整的json文件的
(截取了json的部分内容):

{"isMusician":false,"userId":-1,"topComments":[],"moreHot":true,"hotComments":[{"user":{"locationInfo":null,"liveInfo":null,"experts":null,"authStatus":0,"avatarUrl":"https://p1.music.126.net/dUodYKgDBOv73H1CYSA5xQ==/109951164503755354.jpg","userId":264335643,"vipRights":null,"vipType":0,"expertTags":null,"remarkName":null,"nickname":"你得叫李大爷","userType":0},"beReplied":[],"pendantData":null,"showFloorComment":null,"status":0,"commentId":1687555689,"content":"这一年武当大雪,掌教李玉斧带回了一个叫余福的徒弟。\n年轻掌教背着孩子上山时,昏昏睡去的孩子手里攥紧了一串舍不得吃的鲜红糖葫芦。\n登顶武当后,背着徒弟的年轻道人远望,哽咽道:“小师叔,回山了。","time":1574353296843,"likedCount":221477,"expressionUrl":null,"commentLocationType":0,"parentCommentId":0,"decoration":{},"repliedMark":null,"liked":false},{"user":{"locationInfo":null,"liveInfo":null,"experts":null,"authStatus":1,"avatarUrl":"https://p1.music.126.net/f5bNL8b42Af9ql03i4k_JA==/109951164526049010.jpg","userId":550065289,"vipRights":{"associator":{"vipCode":100,"rights":true},"musicPackage":null,"redVipAnnualCount":1},"vipType":11,"expertTags":null,"remarkName":null,"nickname":"徐泽bre","userType":4},"beReplied":[],"pendantData":null,"showFloorComment":null,"status":0,"commentId":1687490435,"content":"蟹蟹大家 我爱你们","time":1574353452718,"likedCount":219321,"expressionUrl":null,"commentLocationType":1,"parentCommentId":0,"decoration":{},"repliedMark":null,"liked":false},{"user":{"locationInfo":null,"liveInfo":null,"experts":null,"authStatus":0,"avatarUrl":"https://p1.music.126.net/835EnLwrtg9PjcLZV2SRsQ==/109951164634948296.jpg","userId":360302933,"vipRights":{"associator":{"vipCode":100,"rights":true},"musicPac
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2020-2-12 11:22:23 | 显示全部楼层
get就行:
  1. import requests


  2. url = 'https://music.163.com/#/song?id=1404885266'
  3. headers = {'user-agent': 'firefox'}
  4. r = requests.get(url,headers=headers)
  5. print(r.status_code)
  6. print(r.text)
复制代码

                               
登录/注册后可看大图
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-2-12 21:27:35 | 显示全部楼层

可是获取热评的应该是post不是get
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-2-13 09:17:16 | 显示全部楼层
xiangzhihengkan 发表于 2020-2-12 21:27
可是获取热评的应该是post不是get

cookie放到headers里,不要放到data里。
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-2-13 09:31:08 | 显示全部楼层    本楼为最佳答案   
我这里运行没问题,但是再输入网址时,注意去掉#_好像是什么反爬虫机制,我的帖子里,有一个就是关于这个的,你可以去看看,你的代码在我这里运行是可以获取到一个完整的json文件的
(截取了json的部分内容):

{"isMusician":false,"userId":-1,"topComments":[],"moreHot":true,"hotComments":[{"user":{"locationInfo":null,"liveInfo":null,"experts":null,"authStatus":0,"avatarUrl":"https://p1.music.126.net/dUodYKgDBOv73H1CYSA5xQ==/109951164503755354.jpg","userId":264335643,"vipRights":null,"vipType":0,"expertTags":null,"remarkName":null,"nickname":"你得叫李大爷","userType":0},"beReplied":[],"pendantData":null,"showFloorComment":null,"status":0,"commentId":1687555689,"content":"这一年武当大雪,掌教李玉斧带回了一个叫余福的徒弟。\n年轻掌教背着孩子上山时,昏昏睡去的孩子手里攥紧了一串舍不得吃的鲜红糖葫芦。\n登顶武当后,背着徒弟的年轻道人远望,哽咽道:“小师叔,回山了。","time":1574353296843,"likedCount":221477,"expressionUrl":null,"commentLocationType":0,"parentCommentId":0,"decoration":{},"repliedMark":null,"liked":false},{"user":{"locationInfo":null,"liveInfo":null,"experts":null,"authStatus":1,"avatarUrl":"https://p1.music.126.net/f5bNL8b42Af9ql03i4k_JA==/109951164526049010.jpg","userId":550065289,"vipRights":{"associator":{"vipCode":100,"rights":true},"musicPackage":null,"redVipAnnualCount":1},"vipType":11,"expertTags":null,"remarkName":null,"nickname":"徐泽bre","userType":4},"beReplied":[],"pendantData":null,"showFloorComment":null,"status":0,"commentId":1687490435,"content":"蟹蟹大家 我爱你们","time":1574353452718,"likedCount":219321,"expressionUrl":null,"commentLocationType":1,"parentCommentId":0,"decoration":{},"repliedMark":null,"liked":false},{"user":{"locationInfo":null,"liveInfo":null,"experts":null,"authStatus":0,"avatarUrl":"https://p1.music.126.net/835EnLwrtg9PjcLZV2SRsQ==/109951164634948296.jpg","userId":360302933,"vipRights":{"associator":{"vipCode":100,"rights":true},"musicPac
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-2-13 20:58:53 | 显示全部楼层
suchocolate 发表于 2020-2-13 09:17
cookie放到headers里,不要放到data里。

已经改过来了,谢谢
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-2-13 20:59:24 | 显示全部楼层
ColaPlusIce 发表于 2020-2-13 09:31
我这里运行没问题,但是再输入网址时,注意去掉#_好像是什么反爬虫机制,我的帖子里,有一个就是关于这个的 ...

已照做,谢谢
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2026-3-3 12:52

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表