鱼C论坛

 找回密码
 立即注册
查看: 2966|回复: 4

scrapy持久化存储输出结果间隔一行None

[复制链接]
发表于 2021-6-5 13:48:03 | 显示全部楼层 |阅读模式
5鱼币
85776a384837e5a9fe37c8b50640c23.png
  1. import scrapy
  2. from tutorial.items import QuoteItem

  3. class QuotesSpider(scrapy.Spider):
  4.     name = 'quotes'
  5.     #allowed_domains = ['quoten.toscrape.com']
  6.     start_urls = ['http://quotes.toscrape.com/']

  7.     def parse(self, response):
  8.         tr_list = response.xpath('//div[@class="col-md-8"]//div')
  9.         for tr in tr_list:
  10.             item = QuoteItem()
  11.             item['text'] = tr.xpath('./span[@class="text"]/text()').extract_first()
  12.             item['author'] = tr.xpath('./span/small[@class="author"]/text()').extract_first()
  13.             item['tags'] = tr.xpath('./div/a/text()').extract()
  14.             yield item
复制代码

小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2021-6-5 13:49:11 | 显示全部楼层
这是Item的设置
  1. import scrapy

  2. class QuoteItem(scrapy.Item):
  3.     #定义三个字段
  4.     text = scrapy.Field()
  5.     author = scrapy.Field()
  6.     tags = scrapy.Field()
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2021-6-5 13:49:42 | 显示全部楼层
这是pipelines

  1. class TutorialPipeline:
  2.     def process_item(self, item, spider):
  3.         print(item)
  4.         return item
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2021-6-5 13:50:53 | 显示全部楼层
输出结果
{'author': 'Albert Einstein',
'tags': ['change', 'deep-thoughts', 'thinking', 'world'],
'text': '“The world as we have created it is a process of our thinking. It '
         'cannot be changed without changing our thinking.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'J.K. Rowling',
'tags': ['abilities', 'choices'],
'text': '“It is our choices, Harry, that show what we truly are, far more '
         'than our abilities.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'Albert Einstein',
'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'],
'text': '“There are only two ways to live your life. One is as though nothing '
         'is a miracle. The other is as though everything is a miracle.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'Jane Austen',
'tags': ['aliteracy', 'books', 'classic', 'humor'],
'text': '“The person, be it gentleman or lady, who has not pleasure in a good '
         'novel, must be intolerably stupid.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'Marilyn Monroe',
'tags': ['be-yourself', 'inspirational'],
'text': "“Imperfection is beauty, madness is genius and it's better to be "
         'absolutely ridiculous than absolutely boring.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'Albert Einstein',
'tags': ['adulthood', 'success', 'value'],
'text': '“Try not to become a man of success. Rather become a man of value.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'André Gide',
'tags': ['life', 'love'],
'text': '“It is better to be hated for what you are than to be loved for what '
         'you are not.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'Thomas A. Edison',
'tags': ['edison', 'failure', 'inspirational', 'paraphrased'],
'text': "“I have not failed. I've just found 10,000 ways that won't work.”"}
{'author': None, 'tags': [], 'text': None}
{'author': 'Eleanor Roosevelt',
'tags': ['misattributed-eleanor-roosevelt'],
'text': '“A woman is like a tea bag; you never know how strong it is until '
         "it's in hot water.”"}
{'author': None, 'tags': [], 'text': None}
{'author': 'Steve Martin',
'tags': ['humor', 'obvious', 'simile'],
'text': '“A day without sunshine is like, you know, night.”'}
{'author': None, 'tags': [], 'text': None}
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2021-6-5 13:51:27 | 显示全部楼层
wcq15759797758 发表于 2021-6-5 13:50
输出结果
{'author': 'Albert Einstein',
'tags': ['change', 'deep-thoughts', 'thinking', 'world'],
...

数据是爬取到了但是 爬取到数据之后下一行的数据全是空
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2025-6-23 01:57

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表