scrapy框架爬取百度贴吧 xpath没错为什么爬出来啥也没有,Python交流,编程语言专区,鱼C论坛

wcq15759797758 发表于 2021-5-31 16:39:51

scrapy框架爬取百度贴吧 xpath没错为什么爬出来啥也没有

import scrapy
from baidutieba.items import BaidutiebaItem

class Baidutieba1Spider(scrapy.Spider):
name = 'baidutieba1'
#allowed_domains = ['www.baidu.com']
start_urls = ['https://tieba.baidu.com/f?kw=python&ie=utf-8&pn=0']

def parse(self,response):
   #分组
   tr_list = response.xpath('//div[@id="pagelet_frs-list/pagelet/thread_list"]//li')
   for tr in tr_list:
         item = BaidutiebaItem()
         item["title"] = tr.xpath('./div/div/div/div/a/text()').extract_first()
         item['href'] = tr.xpath('./div/div/div/div/a/@href').extract_first()
         item['publish_date'] = tr.xpath('./div/div/div/div/span/a').extract_first()

         yield scrapy.Request(
            item["href"],
            callback=self.patse_detail,
            meta = {"item":item}
         )

def patse_detail(self,response):
   item = response.meta["item"]
   print(item)

nahongyan1997 发表于 2021-6-23 15:43:07

网页内容有语法错误导致虽然浏览器可以识别bs4却无法定位。

页: [1]

鱼C论坛's Archiver

scrapy框架爬取百度贴吧 xpath没错为什么爬出来啥也没有