|
发表于 2021-2-1 10:47:06
|
显示全部楼层
将你爬取的网页保存成html
- import requests
- import bs4
- import time
- #请求网页
- headers ={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.53"}
- respose = requests.get("https://v.youku.com/v_show/id_XNDk2MzA5Nzc2MA==.html?spm=a2ha1.14919748_WEBTV_JINGXUAN.drawer4.d_zj1_4&s=adbd5cc3e8e64e668546&scm=20140719.manual.5295.show_adbd5cc3e8e64e668546&s=adbd5cc3e8e64e668546",headers=headers)
- #print(respose.request.headers)
- res = respose.text
- #解析网页
- soup = bs4.BeautifulSoup(res, "html.parser")
- titles = soup.find_all("div", class_="anthology-content")
- for each in titles:
- print(each.a['href'])
- with open('优酷.html','w',encoding='utf-8') as fp:
- fp.write(res)
复制代码
单独保存视频的话看2楼3楼 |
|