如何保存？不太会，爬倒是爬下来了,Python交流,编程语言专区,鱼C论坛

霓裳发表于 2021-2-1 04:49:45

如何保存？不太会，爬倒是爬下来了

import requests
import bs4
import time
#请求网页
headers ={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.53"}
respose = requests.get("https://v.youku.com/v_show/id_XNDk2MzA5Nzc2MA==.html?spm=a2ha1.14919748_WEBTV_JINGXUAN.drawer4.d_zj1_4&s=adbd5cc3e8e64e668546&scm=20140719.manual.5295.show_adbd5cc3e8e64e668546&s=adbd5cc3e8e64e668546",headers=headers)
#print(respose.request.headers)
res = respose.text

#解析网页
soup = bs4.BeautifulSoup(res, "html.parser")
titles = soup.find_all("div", class_="anthology-content")
for each in titles:
print(each.a['href'])

wp231957 发表于 2021-2-1 09:06:56

没那么简单的
这个地址是播放网页地址，并不是下载地址，不信你可以把他复制到迅雷窗口一试便知

Twilight6 发表于 2021-2-1 09:21:53

优酷这些市场上热门的播放器，肯定都有反爬机制

你想用爬虫爬到这些视频源文件，应该都比较难

逃兵发表于 2021-2-1 10:47:06

将你爬取的网页保存成html
import requests
import bs4
import time
#请求网页
headers ={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.53"}
respose = requests.get("https://v.youku.com/v_show/id_XNDk2MzA5Nzc2MA==.html?spm=a2ha1.14919748_WEBTV_JINGXUAN.drawer4.d_zj1_4&s=adbd5cc3e8e64e668546&scm=20140719.manual.5295.show_adbd5cc3e8e64e668546&s=adbd5cc3e8e64e668546",headers=headers)
#print(respose.request.headers)
res = respose.text

#解析网页
soup = bs4.BeautifulSoup(res, "html.parser")
titles = soup.find_all("div", class_="anthology-content")
for each in titles:
print(each.a['href'])

with open('优酷.html','w',encoding='utf-8') as fp:
fp.write(res)

单独保存视频的话看2楼3楼

霓裳发表于 2021-2-2 01:38:16

逃兵发表于 2021-2-1 10:47
将你爬取的网页保存成html

单独保存视频的话看2楼3楼

感谢！

页: [1]

鱼C论坛's Archiver

如何保存？不太会，爬倒是爬下来了