|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
最近在看小甲鱼关于scrapy的视频,到了最后一步发现导出来的信息只有两行,而不是都倒出来了,但是如果改成print title、link 和 desc 的时候是完整的。
有哪位高手可以告诉一下怎么改,或者是什么原因吗?
- import scrapy
- from tutorial.items import DmozItem
- class DmozSpider(scrapy.Spider):
- name = 'dmoz'
- allowed_domain = ['dmoztools.net']
- start_urls = [
- 'http://www.dmoztools.net/Computers/Programming/Languages/Python/Resources/',
- 'http://www.dmoztools.net/Computers/Programming/Languages/Python/Books/']
- def parse(self,response):
- # filename = response.url.split('/')[-2]
- # with open(filename,'wb') as f:
- # f.write(response.body)
- sel = scrapy.selector.Selector(response)# 选择器
- sites = sel.xpath('//section/div/div/div/div[@class="title-and-desc"]')
- items = []
-
- for site in sites:
- item = DomzItem()
-
- item['title'] = site.xpath('a/div/text()').extract()
- item['link'] = site.xpath('a/@href').extract()
- item['desc'] = site.xpath('div/text()').extract()
- items.append(item)
- return items
-
复制代码 |
|