如何保存？不太会，爬倒是爬下来了

霓裳 · 发表于 2021-2-1 04:49:45

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

import requests
import bs4
import time
#请求网页
headers ={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.53"}
respose = requests.get("https://v.youku.com/v_show/id_XNDk2MzA5Nzc2MA==.html?spm=a2ha1.14919748_WEBTV_JINGXUAN.drawer4.d_zj1_4&s=adbd5cc3e8e64e668546&scm=20140719.manual.5295.show_adbd5cc3e8e64e668546&s=adbd5cc3e8e64e668546",headers=headers)
#print(respose.request.headers)
res = respose.text

#解析网页
soup = bs4.BeautifulSoup(res, "html.parser")
titles = soup.find_all("div", class_="anthology-content")
for each in titles:
print(each.a['href'])

wp231957 · 发表于 2021-2-1 09:06:56

没那么简单的
这个地址是播放网页地址，并不是下载地址，不信你可以把他复制到迅雷窗口一试便知

Twilight6 · 发表于 2021-2-1 09:21:53

优酷这些市场上热门的播放器，肯定都有反爬机制

你想用爬虫爬到这些视频源文件，应该都比较难

逃兵 · 发表于 2021-2-1 10:47:06

将你爬取的网页保存成html

import requests
import bs4
import time
#请求网页
headers ={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.53"}
respose = requests.get("https://v.youku.com/v_show/id_XNDk2MzA5Nzc2MA==.html?spm=a2ha1.14919748_WEBTV_JINGXUAN.drawer4.d_zj1_4&s=adbd5cc3e8e64e668546&scm=20140719.manual.5295.show_adbd5cc3e8e64e668546&s=adbd5cc3e8e64e668546",headers=headers)
#print(respose.request.headers)
res = respose.text
#解析网页
soup = bs4.BeautifulSoup(res, "html.parser")
titles = soup.find_all("div", class_="anthology-content")
for each in titles:
print(each.a['href'])
with open('优酷.html','w',encoding='utf-8') as fp:
fp.write(res)

复制代码

单独保存视频的话看2楼3楼

霓裳 · 发表于 2021-2-2 01:38:16

逃兵发表于 2021-2-1 10:47
将你爬取的网页保存成html

单独保存视频的话看2楼3楼

感谢！

账号		自动登录	找回密码
密码			立即注册

如何保存？不太会，爬倒是爬下来了

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块