|
|

楼主 |
发表于 2018-4-9 11:18:59
|
显示全部楼层
按照您说的我试了一下,爬取成功了
代码是这样的
import scrapy
from scrquote_all.items import ScrquoteAllItem
class QuoteallSpider(scrapy.Spider):
name='quoteall'
start_urls=[
'http://quotes.toscrape.com/page/1/',
]
def parse(self,response):
for quote in response.css('div.quote'):
item=ScrquoteAllItem()
item['text']=quote.css('span.text::text').extract_first()
item['author']=quote.css('small.author::text').extract_first()
item['tags']=quote.css('div.tags a.tag::text').extract()
yield item
next_page=response.css('li.next a::attr(href)').extract_first()
if next_page is not None:
yield response.follow(next_page,callback=self.parse)
但我还是有一点疑问想请教一下,,、最后返回的仍然是一个大的列表,就是和我最开始想做成的效果是一样的
[
{"text": "\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\u201d", "author": "Albert Einstein", "tags": ["change", "deep-thoughts", "thinking", "world"]},
{"text": "\u201cIt is our choices, Harry, that show what we truly are, far more than our abilities.\u201d", "author": "J.K. Rowling", "tags": ["abilities", "choices"]},
{"text": "\u201cThere are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.\u201d", "author": "Albert Einstein", "tags": ["inspirational", "life", "live", "miracle", "miracles"]},
...
]
这是最后返回保存的json文件的格式,,我明明每一次单独返回结果的,为什么最后保存的文件还是一个大的列表呢? |
|