python爬取歌曲的find和find_all函数问题,Python交流,编程语言专区,鱼C论坛

jiaozhulhh 发表于 2021-11-19 10:50:38

python爬取歌曲的find和find_all函数问题

爬取网易云热门音乐，引入了BeautifulSoup库，然后写了get_music函数，其中部分函数代码如下：
url = 'http://music.163.com/discover/toplist?id=3778678' #网易云音乐-热歌榜-排行榜
headers = {'User-Agent':'xxx'}
res = requests.get(url,headers=headers)
res_text = res.text
music_soup = BeautifulSoup(res_text,'html.parser')
music_list = music_soup.find('ul',class_="f-hide").find_all('a')

请教下各位大神，就是不明白为什么这个find函数要用ul，好像网页代码都没有这个ul标签，然后还有为什么要用class_="f-hide"，find函数和find_all连起来用的数据结果是怎样的？

网页的部分源代码是：
<tr id="18914695461637027019420" class="even ">
<td>
<div class="hd">
<span class="num">1</span>
<div class="rk ">
<span class="ico u-icn u-icn-72 s-fc4">0</span>
</div>
</div>
</td>
<td class="rank">
<div class="f-cb">
<div class="tt">
<a href="/song?id=1891469546"><img class="rpic" src="http://p4.music.126.net/nNg4YjkcK1AwCX1FrN8VOQ==/109951166578333625.jpg?param=50y50&quality=100">
</a>
<span data-res-id="1891469546" data-res-type="18" data-res-action="play" class="ply "> </span>
<div class="ttc">
<span class="txt">
<a href="/song?id=1891469546">
<b title="删了吧 - (要不你还是把我删了吧)">
"删"
<div class="soil">光反中</div>
"了吧"
</b>
</a>
<span title="要不你还是把我删了吧" class="s-fc8">
"- (要不你还是把"
<div class="soil">黝佒睑綜</div>
"我删了吧)"
</span>
</span>
</div>
</div>
</div>
</td>
<td class=" s-fc3">
<span class="u-dur ">03:24</span>
<div class="opt hshow">
<a class="u-icn u-icn-81 icn-add" href="javascript:;" title="添加到播放列表" hidefocus="true" data-res-type="18" data-res-id="1891469546" data-res-action="addto"></a><span data-res-id="1891469546" data-res-type="18" data-res-action="fav" class="icn icn-fav" title="收藏"></span><span data-res-id="1891469546" data-res-type="18" data-res-action="share" data-res-name="删了吧" data-res-author="烟(许佳豪)" data-res-pic="http://p4.music.126.net/nNg4YjkcK1AwCX1FrN8VOQ==/109951166578333625.jpg" class="icn icn-share" title="分享">分享</span><span data-res-id="1891469546" data-res-type="18" data-res-action="download" class="icn icn-dl" title="下载"></span>
</div>
</td>
<td class="">
<div class="text" title="烟(许佳豪)">
<span title="烟(许佳豪)">
<a class="" href="/artist?id=49937403" hidefocus="true">
"烟("
<div class="soil">鰔</div>
"许佳豪)"
</a>
</span>
</div>
</td>
</tr>

suchocolate 发表于 2021-11-19 23:45:15

1) 浏览器中的html是经过css和js渲染的，代码上会有变化。爬虫爬到的html是没有经过渲染的，比较原始，它和浏览器看到的会有不同。你可以用下面的代码把爬虫的html保存下来，咨询观察，会发现有很多ul。res_text = res.text
with open('res.txt', 'w', encoding='utf-8') as f:
f.write(res.text)

2) music_soup.find('ul',class_="f-hide").find_all('a')，这个是先找到ul，然后再在ul下面找到所有的a。

jiaozhulhh 发表于 2021-11-20 00:58:15

试了下，明白了，谢谢

页: [1]

鱼C论坛's Archiver

python爬取歌曲的find和find_all函数问题