|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
初学爬虫,照葫芦画瓢,用scrapy爬取廖雪峰大大的python教程,但是爬取的数据完全没有顺序啊!!(scrapy爬取为异步处理,但是这个没顺序完全没法看啊 )如何处理啊 !!!spider代码如下:
- # -*- coding: utf-8 -*-
- import scrapy
- class Tt2Spider(scrapy.Spider):
- name = 'tt2'
- allowed_domains = ['liaoxuefeng.com']
- start_urls = ['https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000']
- def parse(self, response):
- tr_list = response.xpath("//ul[@id='x-wiki-index']/div/div/a")
- for tr in tr_list:
- item = {}
- item['href'] = tr.xpath("./@href").extract_first()
- item['title'] = tr.xpath("./text()").extract_first()
- next_url = 'https://www.liaoxuefeng.com' + item['href']
- yield scrapy.Request(
- next_url,
- callback=self.parse2,
- meta={'item': item}
- )
- def parse2(self, response):
- item = response.meta["item"]
- item['content'] = response.xpath("//div[@class='x-wiki-content x-main-content']/p/text()").extract()
- print(item)
复制代码
|
|