鱼C论坛

 找回密码
 立即注册
查看: 2587|回复: 4

scrapy持久化存储输出结果间隔一行None

[复制链接]
发表于 2021-6-5 13:48:03 | 显示全部楼层 |阅读模式
5鱼币
85776a384837e5a9fe37c8b50640c23.png
import scrapy
from tutorial.items import QuoteItem

class QuotesSpider(scrapy.Spider):
    name = 'quotes'
    #allowed_domains = ['quoten.toscrape.com']
    start_urls = ['http://quotes.toscrape.com/']

    def parse(self, response):
        tr_list = response.xpath('//div[@class="col-md-8"]//div')
        for tr in tr_list:
            item = QuoteItem()
            item['text'] = tr.xpath('./span[@class="text"]/text()').extract_first()
            item['author'] = tr.xpath('./span/small[@class="author"]/text()').extract_first()
            item['tags'] = tr.xpath('./div/a/text()').extract()
            yield item

想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2021-6-5 13:49:11 | 显示全部楼层
这是Item的设置
import scrapy

class QuoteItem(scrapy.Item):
    #定义三个字段
    text = scrapy.Field()
    author = scrapy.Field()
    tags = scrapy.Field()
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2021-6-5 13:49:42 | 显示全部楼层
这是pipelines
class TutorialPipeline:
    def process_item(self, item, spider):
        print(item)
        return item
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2021-6-5 13:50:53 | 显示全部楼层
输出结果
{'author': 'Albert Einstein',
'tags': ['change', 'deep-thoughts', 'thinking', 'world'],
'text': '“The world as we have created it is a process of our thinking. It '
         'cannot be changed without changing our thinking.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'J.K. Rowling',
'tags': ['abilities', 'choices'],
'text': '“It is our choices, Harry, that show what we truly are, far more '
         'than our abilities.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'Albert Einstein',
'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'],
'text': '“There are only two ways to live your life. One is as though nothing '
         'is a miracle. The other is as though everything is a miracle.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'Jane Austen',
'tags': ['aliteracy', 'books', 'classic', 'humor'],
'text': '“The person, be it gentleman or lady, who has not pleasure in a good '
         'novel, must be intolerably stupid.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'Marilyn Monroe',
'tags': ['be-yourself', 'inspirational'],
'text': "“Imperfection is beauty, madness is genius and it's better to be "
         'absolutely ridiculous than absolutely boring.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'Albert Einstein',
'tags': ['adulthood', 'success', 'value'],
'text': '“Try not to become a man of success. Rather become a man of value.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'André Gide',
'tags': ['life', 'love'],
'text': '“It is better to be hated for what you are than to be loved for what '
         'you are not.”'}
{'author': None, 'tags': [], 'text': None}
{'author': 'Thomas A. Edison',
'tags': ['edison', 'failure', 'inspirational', 'paraphrased'],
'text': "“I have not failed. I've just found 10,000 ways that won't work.”"}
{'author': None, 'tags': [], 'text': None}
{'author': 'Eleanor Roosevelt',
'tags': ['misattributed-eleanor-roosevelt'],
'text': '“A woman is like a tea bag; you never know how strong it is until '
         "it's in hot water.”"}
{'author': None, 'tags': [], 'text': None}
{'author': 'Steve Martin',
'tags': ['humor', 'obvious', 'simile'],
'text': '“A day without sunshine is like, you know, night.”'}
{'author': None, 'tags': [], 'text': None}
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2021-6-5 13:51:27 | 显示全部楼层
wcq15759797758 发表于 2021-6-5 13:50
输出结果
{'author': 'Albert Einstein',
'tags': ['change', 'deep-thoughts', 'thinking', 'world'],
...

数据是爬取到了但是 爬取到数据之后下一行的数据全是空
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2025-1-15 17:37

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表