|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
def parse(self, response):
#获取当前爬取的榜单
rank_tab=response.xpath('//ul[@class="rank-tab"]/li[@class="active"]/text()').getall()[0]
print('='*50,'当前爬取榜单为:',rank_tab,'='*50)
#视频的信息都放在li标签中,这里先获取所有的li标签
#之后遍历rank_lists获取每个视频的信息
rank_lists=response.xpath('//ul[@class="rank-list"]/li')
for rank_list in rank_lists:
rank_num=rank_list.xpath('div[@class="num"]/text()').get()
title=rank_list.xpath('div/div[@class="info"]/a/text()').get()
# 抓取视频的url,切片后获得视频的id
id=rank_list.xpath('div/div[@class="info"]/a/@href').get().split('/av')[-1]
# 拼接详情页api的url
Detail_link=f'https://api.bilibili.com/x/web-interface/archive/stat?aid={id}'
Labels_link=f'https://api.bilibili.com/x/web-interface/view/detail/tags?aid={id}'
author=rank_list.xpath('div/div[@class="info"]/div[@class="detail"]/a/span/text()').get()
score=rank_list.xpath('div/div[@class="info"]/div[@class="pts"]/div/text()').get()
运行结果:2020-06-27 13:21:06 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://api.bilibili.com/x/web-i ... /video/BV1Rz4y1Q7tg> (referer: https://www.bilibili.com/ranking/all/0/0/30)
2020-06-27 13:21:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://api.bilibili.com/x/web-i ... /video/BV1Rz4y1Q7tg>: HTTP status code is not handled or not allowed
本来"aid="应该接一串数字(B站爬取视频的id)但这里却是一个很长的网址,然后导致无法正确访问,请各位大佬看看是哪里出了问题 |
|