如何保存?不太会,爬倒是爬下来了
import requestsimport bs4
import time
#请求网页
headers ={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.53"}
respose = requests.get("https://v.youku.com/v_show/id_XNDk2MzA5Nzc2MA==.html?spm=a2ha1.14919748_WEBTV_JINGXUAN.drawer4.d_zj1_4&s=adbd5cc3e8e64e668546&scm=20140719.manual.5295.show_adbd5cc3e8e64e668546&s=adbd5cc3e8e64e668546",headers=headers)
#print(respose.request.headers)
res = respose.text
#解析网页
soup = bs4.BeautifulSoup(res, "html.parser")
titles = soup.find_all("div", class_="anthology-content")
for each in titles:
print(each.a['href'])
没那么简单的
这个地址是播放网页地址,并不是下载地址,不信你可以把他复制到迅雷窗口 一试便知
优酷这些市场上热门的播放器,肯定都有反爬机制
你想用爬虫爬到这些视频源文件,应该都比较难
将你爬取的网页保存成html
import requests
import bs4
import time
#请求网页
headers ={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.53"}
respose = requests.get("https://v.youku.com/v_show/id_XNDk2MzA5Nzc2MA==.html?spm=a2ha1.14919748_WEBTV_JINGXUAN.drawer4.d_zj1_4&s=adbd5cc3e8e64e668546&scm=20140719.manual.5295.show_adbd5cc3e8e64e668546&s=adbd5cc3e8e64e668546",headers=headers)
#print(respose.request.headers)
res = respose.text
#解析网页
soup = bs4.BeautifulSoup(res, "html.parser")
titles = soup.find_all("div", class_="anthology-content")
for each in titles:
print(each.a['href'])
with open('优酷.html','w',encoding='utf-8') as fp:
fp.write(res)
单独保存视频的话看2楼3楼 逃兵 发表于 2021-2-1 10:47
将你爬取的网页保存成html
单独保存视频的话看2楼3楼
感谢!
页:
[1]