crawl spider 问题求助,Python交流,编程语言专区,鱼C论坛

唯爱丶雪 发表于 2022-1-9 11:32:35

crawl spider 问题求助

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

class TbSpider(CrawlSpider):
name = 'tb'
allowed_domains = ['wenzhou.gov.cn']
start_urls = ['https://wlwz.wenzhou.gov.cn/wzlist_1_1.html']

rules = (
   Rule(LinkExtractor(allow=r'/wzshow_\d+\.html'), callback='parse_item'),
   Rule(LinkExtractor(allow=r'/wzlist_1_\d+\.html'), follow=True)
)

def parse_item(self, response):
   print(response.body)
   item = {}
   item['title'] = response.xpath('//td[@class="a1"]//a/text()').get()
   print(item['title'])
   return item
   #item['domain_id'] = response.xpath('//input[@id="sid"]/@value').get()
   #item['name'] = response.xpath('//div[@id="name"]').get()
   #item['description'] = response.xpath('//div[@id="description"]').get()

为啥我的输出为空呢，我连response.body都输出不出来

阿奇_o 发表于 2022-1-9 11:32:36

没用过Scrapy，但你的callback应该写错了吧。。

唯爱丶雪 发表于 2022-1-9 14:29:10

阿奇_o 发表于 2022-1-9 14:02
没用过Scrapy，但你的callback应该写错了吧。。

没有，照葫芦画瓢写的

唯爱丶雪 发表于 2022-1-9 15:34:13

偌大的论坛，竟无一人能回答这个问题！

唯爱丶雪 发表于 2022-1-9 22:56:50

分送你了，太难了

阿奇_o 发表于 2022-1-10 16:23:40

唯爱丶雪发表于 2022-1-9 22:56
分送你了，太难了

我昨晚搞了一下，你可以先用Scrapy shell 试试，我一开始用css找没找到，用这个xpath可以

# scrapy shell https://wlwz.wenzhou.gov.cn/wzlist_1_1.html
# 进入scrapy shell 后，可以进行一些试验

In : response.xpath('//*[@class="wzlist"]//*[@class="a1"]//a//@href').extract()
Out:
['/wzshow_170611.html',
'/wzshow_170605.html',
'/wzshow_170607.html',
'/wzshow_170592.html',
'/wzshow_170586.html',
'/wzshow_170588.html',
'/wzshow_170589.html',
'/wzshow_170590.html',
'/wzshow_170594.html',
'/wzshow_170596.html',
'/wzshow_170598.html',
'/wzshow_170600.html',
'/wzshow_170601.html',
'/wzshow_170603.html',
'/wzshow_170582.html',
'/wzshow_170584.html']
In : response.xpath('//*[@class="wzlist"]//*[@class="a1"]//a//text()').extract()
Out:
['EJ785405229JP海关留验10天',
'鳌江镇银泰花园小区防疫管控问题',
'温州海关驻邮局办事处的电话一直打不通',
'城发集团，影响形象的错别字',
'浙南科技城龙湖揽镜到底有没有保障性租赁...',
'南环线',
'群租房举报第三次',
'高教博园房产证',
'无证非法经营餐饮请求有关部门查处',
'夫妻双方公积金贷款额度是否可以比照温州...',
'在当前反诈形势下，若被诈骗，受害人信息...',
'柏林公馆店铺招牌',
'关于金域传奇小区门口道路管理',
'关于高校毕业生就业补贴的发放咨询',
'哲学教育无办学许可资质，双减政策依然可...',
'水头镇疫情防控条例']

唯爱丶雪 发表于 2022-1-10 21:47:17

阿奇_o 发表于 2022-1-10 16:23
我昨晚搞了一下，你可以先用Scrapy shell 试试，我一开始用css找没找到，用这个xpath可以

谢谢你，不过我会scrapy ，只是突然学到craw spider故有此疑问

阿奇_o 发表于 2022-1-11 00:01:49

唯爱丶雪发表于 2022-1-10 21:47
谢谢你，不过我会scrapy ，只是突然学到craw spider故有此疑问

哈哈，我本来没用过Scrapy的，昨晚自己看了下书，然后自己练了练，
现在已经可以把这个网站爬取出来了，并且数据保存到MongoDB里。。还蛮有成就感的，哈哈。

谢谢你的问题 ^_^

唯爱丶雪 发表于 2022-1-12 17:55:14

阿奇_o 发表于 2022-1-11 00:01
哈哈，我本来没用过Scrapy的，昨晚自己看了下书，然后自己练了练，
现在已经可以把这个网站爬取出来了， ...

不客气{:5_91:}

页: [1]

鱼C论坛's Archiver

crawl spider 问题求助