鱼C论坛

 找回密码
 立即注册
查看: 3202|回复: 10

[已解决]如何爬取网易云指定歌单内所有歌曲?

[复制链接]
发表于 2022-10-1 10:31:19 | 显示全部楼层 |阅读模式
60鱼币
给出一个url,如https://music.163.com/#/playlist?id=2916766519
然后爬取url内所有歌曲名和歌手,返回一个list
代码该怎么写?
最佳答案
2022-10-1 10:31:20
https://fishc.com.cn/forum.php?mod=viewthread&tid=159468

最佳答案

查看完整内容

https://fishc.com.cn/forum.php?mod=viewthread&tid=159468
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2022-10-1 10:31:20 From FishC Mobile | 显示全部楼层    本楼为最佳答案   
https://fishc.com.cn/forum.php?mod=viewthread&tid=159468
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2022-10-1 11:11:20 | 显示全部楼层
可能抓网页版的话,只能是显示十条记录
解决办法是在客户端把歌单加入自己的歌单里面,然后就可以在网页里显示所有记录,就可以抓了
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2022-10-1 15:56:44 | 显示全部楼层
wp231957 发表于 2022-10-1 11:05
https://fishc.com.cn/forum.php?mod=viewthread&tid=159468

  File "C:\Users\Ad\PycharmProjects\pythonProject2\main.py", line 32
    filename = x[1].translate(str.maketrans("", "", "*?'" /\\ | <: > :"))
                                                            ^
SyntaxError: unexpected character after line continuation character
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2022-10-1 19:02:00 | 显示全部楼层
叼辣条闯世界 发表于 2022-10-1 15:56
File "C:%users\Ad\PycharmProjects\pythonProject2\main.py", line 32
    filename = x[1].translat ...

我还以为老代码遇到新问题了呢,可是 还是能运行的  能跑到的哦
002.png
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2022-10-1 19:39:39 | 显示全部楼层
wp231957 发表于 2022-10-1 19:02
我还以为老代码遇到新问题了呢,可是 还是能运行的  能跑到的哦

能说下具体怎么用吗?谢谢
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2022-10-1 19:44:46 From FishC Mobile | 显示全部楼层
叼辣条闯世界 发表于 2022-10-1 19:39
能说下具体怎么用吗?谢谢

你代码发一下
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2022-10-1 19:50:52 | 显示全部楼层


改了一下31,32行才能用
filename = x[1].replace("*", "").replace("?", "").replace("'", "").replace('"', "").replace("/","").replace("","").replace(" | ","").replace(" < ","").replace(":","").replace(" > ","").replace(":","")
filename=x[1].translate(str.maketrans("", "", "*?'"/\\|<:>:"))
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2022-10-1 20:37:30 | 显示全部楼层
得到cookies以后使用get请求
import requests

cookies = {
    'timing_user_id': 'time_0mfxPt9HTB',
    '_ga': 'GA1.1.104650372.1664026337',
    'Qs_lvt_382223': '1664026337',
    'Qs_pv_382223': '4006535718092135000',
    '_clck': 'rgq7vf|1|f55|0',
    '_ga_C6TGHFPQ1H': 'GS1.1.1664026336.1.1.1664026348.0.0.0',
    '_iuqxldmzr_': '32',
    '_ntes_nnid': '7d460b28d2e32d2e5aebca855522cfbc,1664591332006',
    '_ntes_nuid': '7d460b28d2e32d2e5aebca855522cfbc',
    'NMTID': '00OH9338hyUXwej-k4bhmwXohtwN3sAAAGDkWC1tA',
    'WNMCID': 'efpkop.1664591333378.01.0',
    'WEVNSM': '1.0.0',
    'WM_NI': 'MpHE%2FuRGfm1Da9ObFLHFbsNVmHdb6wK6K4p%2FL3GI%2Bl6CVjHcaU5dF8Y%2FRmxK0tZ3bs12poL%2B7E50umq0Hy2Ztombl2FJogONf3IEl6PBot%2F%2Bv%2FnWJyGl5eXOxryb5SyPdVA%3D',
    'WM_NIKE': '9ca17ae2e6ffcda170e2e6eed5d7648786e5aed943b8e78fb7c54e969b9a86c844ad928886d449a7e8faa5d92af0fea7c3b92a8fb088a6ae7f9299aa8eb450a3e78183c5488c9aae94cd5eaf98ac83b57d89ea85d5f32590ed89aaaa68f69d9a8bcb69a38bfb85c5349bf19e8efb41f4ac96d4e179f3eec090cf21f59bfbd1f867b3f0889ab152b8b1a383ef3a8696a188ce46a6ed87a6ef3e949ea195ae21b6f1a2b8c97385b4fb86db45bcaeb783b859bb90acb7c437e2a3',
    'WM_TID': 'k43up4VgtS5FQUABFAaFHtZXOFHf4gom',
    'playerid': '90015427',
    'JSESSIONID-WYYY': 'cP4j5wVZXAvuWFZXFzkKcp8YT7UnQfc9iyoK4p3dgckP433SndnU4s6nI%5CwcVN%2BqNYI%2BO31ZYy19hsFXh%2FFo4HrjpYz7z6vSiZqTnNkUmMe%2FSbwZBxO3wtf7TXDYjvhyx9e0cTjirZKxEScI%2Fd6m1Krtkd7opJbMNi8qkDSlXogw2oWO%3A1664626837530',
}

headers = {
    'authority': 'music.163.com',
    'pragma': 'no-cache',
    'cache-control': 'no-cache',
    'sec-ch-ua': '"Chromium";v="21", " Not;A Brand";v="99"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-dest': 'iframe',
    'referer': 'https://music.163.com/',
    'accept-language': 'zh-CN,zh;q=0.9',
    # Requests sorts cookies= alphabetically
    # 'cookie': 'timing_user_id=time_0mfxPt9HTB; _ga=GA1.1.104650372.1664026337; Qs_lvt_382223=1664026337; Qs_pv_382223=4006535718092135000; _clck=rgq7vf|1|f55|0; _ga_C6TGHFPQ1H=GS1.1.1664026336.1.1.1664026348.0.0.0; _iuqxldmzr_=32; _ntes_nnid=7d460b28d2e32d2e5aebca855522cfbc,1664591332006; _ntes_nuid=7d460b28d2e32d2e5aebca855522cfbc; NMTID=00OH9338hyUXwej-k4bhmwXohtwN3sAAAGDkWC1tA; WNMCID=efpkop.1664591333378.01.0; WEVNSM=1.0.0; WM_NI=MpHE%2FuRGfm1Da9ObFLHFbsNVmHdb6wK6K4p%2FL3GI%2Bl6CVjHcaU5dF8Y%2FRmxK0tZ3bs12poL%2B7E50umq0Hy2Ztombl2FJogONf3IEl6PBot%2F%2Bv%2FnWJyGl5eXOxryb5SyPdVA%3D; WM_NIKE=9ca17ae2e6ffcda170e2e6eed5d7648786e5aed943b8e78fb7c54e969b9a86c844ad928886d449a7e8faa5d92af0fea7c3b92a8fb088a6ae7f9299aa8eb450a3e78183c5488c9aae94cd5eaf98ac83b57d89ea85d5f32590ed89aaaa68f69d9a8bcb69a38bfb85c5349bf19e8efb41f4ac96d4e179f3eec090cf21f59bfbd1f867b3f0889ab152b8b1a383ef3a8696a188ce46a6ed87a6ef3e949ea195ae21b6f1a2b8c97385b4fb86db45bcaeb783b859bb90acb7c437e2a3; WM_TID=k43up4VgtS5FQUABFAaFHtZXOFHf4gom; playerid=90015427; JSESSIONID-WYYY=cP4j5wVZXAvuWFZXFzkKcp8YT7UnQfc9iyoK4p3dgckP433SndnU4s6nI%5CwcVN%2BqNYI%2BO31ZYy19hsFXh%2FFo4HrjpYz7z6vSiZqTnNkUmMe%2FSbwZBxO3wtf7TXDYjvhyx9e0cTjirZKxEScI%2Fd6m1Krtkd7opJbMNi8qkDSlXogw2oWO%3A1664626837530',
}

params = {
    'id': '2916766519',
}

response = requests.get('https://music.163.com/playlist', params=params, cookies=cookies, headers=headers)

print(response.request.url)
print(response.text)

结果如下:
<ul class="f-hide"><li><a href="/song?id=1455273374">风的小径</a></li><li><a href="/song?id=1369034842">枫桥雨</a></li><li><a href="/song?id=1931264432">君だけの晴れ</a></li><li><a href="/song?id=1445342841">あなたと一緒に夏を過ごしたい</a></li><li><a href="/song?id=1448989137">-秋日和がもしも…-</a></li><li><a href="/song?id=1873575452">﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌﹌</a></li><li><a href="/song?id=1847589085">柒·壹</a></li><li><a href="/song?id=1438574395">ㅤ</a></li><li><a href="/song?id=1981619786">一颗流星の愿望清单</a></li><li><a href="/song?id=1912189078">L u c k y 。</a></li></ul>
<textarea id="song-list-pre-data" style="display:none;">Qvw9IS37axsdjkffyjoME2J0KzonFO6Pw9LWJ4gURRQwtEqDhiPzabwfvejME210KzotpdZKA9I1u0UkgQxaDY
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2022-10-1 21:19:02 | 显示全部楼层
方法二(selenium):
from selenium import webdriver
from selenium.webdriver.common.by import By
import time

#url = input('请输入网址:')
url = 'https://music.163.com/playlist?id=2916766519'
option = webdriver.ChromeOptions()
option.add_argument("--disable-blink-features=AutomationControlled")
driver = webdriver.Chrome()
driver.get(url)
time.sleep(3)
#driver.switch_to.frame('g_iframe')

driver.switch_to.frame(0)
driver.implicitly_wait(3)
song_elements = driver.find_elements(By.XPATH, '//tbody/tr//b')
singer_elements = driver.find_elements(By.XPATH, '//tbody/tr/td[4]/div[@class="text"]')
songs = []
singers = []
for each in song_elements:
    songs.append(each.get_attribute('title'))
for each in singer_elements:
    singers.append(each.get_attribute('title'))

print(songs)
print(singers)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2022-10-1 21:20:48 | 显示全部楼层
这个问题的难点是:
如果用requests模块,如要找到正确的包(url)使用正确的headers;
如果使用selenium模块,则注意切换至最外层的frame
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2024-11-14 23:14

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表