能给个可以保存到Excel的版本马……最近学极客py学到这里学不下去了
{:10_250:}我记得小甲鱼老师第一集不是在视频里面手把手教我们敲了按照甲鱼哥的来就好了呀 Twilight6 发表于 2020-6-14 16:06
我记得小甲鱼老师第一集不是在视频里面手把手教我们敲了按照甲鱼哥的来就好了呀
现在不行了 老八秘制 发表于 2020-6-14 16:16
现在不行了
我刚刚照视频敲 成功了,拿去用吧
import requests
import bs4
import openpyxl
def open_url(url):
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36'}
res = requests.get(url, headers=headers)
return res
def find_movies(res):
soup = bs4.BeautifulSoup(res.text, 'html.parser')
# 电影名
movies = []
targets = soup.find_all("div", class_="hd")
for each in targets:
movies.append(each.a.span.text)
# 评分
ranks = []
targets = soup.find_all("span", class_="rating_num")
for each in targets:
ranks.append(each.text)
# 资料
messages = []
targets = soup.find_all("div", class_="bd")
for each in targets:
try:
messages.append(each.p.text.split('\n').strip() + each.p.text.split('\n').strip())
except:
continue
result = []
length = len(movies)
for i in range(length):
result.append( ,ranks ,messages])
return result
# 找出一共有多少个页面
def find_depth(res):
soup = bs4.BeautifulSoup(res.text, 'html.parser')
depth = soup.find('span', class_='next').previous_sibling.previous_sibling.text
return int(depth)
def save_to_excel(result):
wb = openpyxl.Workbook()
ws = wb.active
ws['A1'] = '电影名称'
ws['B1'] = '评分'
ws['C1'] = '资料'
for each in result:
ws.append(each)
wb.save('豆瓣TOP250.xlsx')
def main():
host = "https://movie.douban.com/top250"
res = open_url(host)
depth = find_depth(res)
result = []
for i in range(depth):
url = host + '/?start=' + str(25 * i)
res = open_url(url)
result.extend(find_movies(res))
save_to_excel(result)
if __name__ == "__main__":
main() Twilight6 发表于 2020-6-14 16:30
我刚刚照视频敲 成功了,拿去用吧
{:10_323:}{:10_297:} Twilight6 发表于 2020-6-14 15:46
客气了~
大佬可以qq问你一下问题吗 我在论坛好像发不了图这个代码里面下面这三个
targets = soup.find_all('div', class_='hd')
targets = soup.find_all('span', class_='rating_num')
targets = soup.find_all('div', class_='bd')
有点看不懂 我去对应了豆瓣的源码感觉这里看起来理解不了 a1437485261 发表于 2020-6-14 21:52
大佬可以qq问你一下问题吗 我在论坛好像发不了图这个代码里面下面这三个
targets = soup.find_all('di ...
targets = soup.find_all('div', class_='hd')
div 节点下含有属性 class = 'hd' 的节点内容
targets = soup.find_all('span', class_='rating_num')
span 节点下含有属性 class = 'rating_num' 的节点内容
targets = soup.find_all('div', class_='bd')
div 节点下含有属性 class='bd' 的节点内容 Twilight6 发表于 2020-6-14 21:56
div 节点下含有属性 class = 'hd' 的节点内容
我就是奇怪为啥在寻找评分时是('span', class_='rating_num')而不是(‘div’, class_='star')
而寻找电影名是('div', class_='hd')而不是('span', class_='title')
页:
1
[2]