[已解决]爬虫有关的问题

RIXO · 发表于 2018-9-21 21:44:32

一个动态网页，怎么抓取想要的字符串啊
就比如说http://www.25xz.com/player/1.shtml
这个网址
我找到了他的mp3地址为
http://bama.25xz.com/小卓玛/故乡迭部.mp3

小卓玛和故乡迭部分别是他的专辑名字和歌曲名字

但是如果用urlopen返回的html是一个静态网址，没有相应的专辑名字和歌曲名字
怎么应该获取啊

最佳答案

月排行榜 / 总排行榜

wongyusing

2018-9-21 21:44:33

本帖最后由 wongyusing 于 2018-9-22 20:18 编辑

import requests # requests是python的一个轻量级爬虫框架
from bs4 import BeautifulSoup as bs # BeautifulSoup这个名字太长了，简写成bs
# 打开网页函数
def get_response(url):
headers = {
'Host': 'www.25xz.com',
'Connection': 'keep-alive',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
'Referer': 'http://www.25xz.com/player/1.shtml',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cookie': 'ASP.NET_SessionId=qhgwada3kxhnrckccypdxgah',
}
response = requests.get(url, headers)
response.encoding = 'utf-8'
#response.encoding = 'gbk'
return response
def main():
base_url = 'http://www.25xz.com/ajax/musicList.shtml@1'
response = get_response(base_url)
print(response.text)
#soup = bs(response.text,'lxml')
#print(soup.select('.play_musicname'))
if __name__ == '__main__':
main()

复制代码

还有事情忙着，有空再写注释

跳转到最佳答案楼层

wongyusing · 发表于 2018-9-21 21:44:33

这个最佳答案由 wongyusing 给出，感谢 wongyusing 的回答。

单击隐藏图章

本帖最后由 wongyusing 于 2018-9-22 20:18 编辑

import requests # requests是python的一个轻量级爬虫框架
from bs4 import BeautifulSoup as bs # BeautifulSoup这个名字太长了，简写成bs
# 打开网页函数
def get_response(url):
headers = {
'Host': 'www.25xz.com',
'Connection': 'keep-alive',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
'Referer': 'http://www.25xz.com/player/1.shtml',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cookie': 'ASP.NET_SessionId=qhgwada3kxhnrckccypdxgah',
}
response = requests.get(url, headers)
response.encoding = 'utf-8'
#response.encoding = 'gbk'
return response
def main():
base_url = 'http://www.25xz.com/ajax/musicList.shtml@1'
response = get_response(base_url)
print(response.text)
#soup = bs(response.text,'lxml')
#print(soup.select('.play_musicname'))
if __name__ == '__main__':
main()

复制代码

还有事情忙着，有空再写注释

幽梦三影 · 发表于 2018-9-22 20:33:40

对这个URL发送请求

账号		自动登录	找回密码
密码			立即注册

[已解决]爬虫有关的问题

最佳答案

评分

浏览过的版块