|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 wulijuan 于 2019-2-20 18:54 编辑
求告知,为什么最后保存下来的json文件是0kb啊?是xpath查找的不对吗?
在利用scrapy爬取‘http://dmoztools.net’时,项目已经创建好,item文件也已经定义了容器:
import scrapy
class Domz_Item(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
title = scrapy.Field()
links = scrapy.Field()
desc = scrapy.Field()
接下来写了spider,并导Domz_Item,将title links,desc都添加到了列表items中:
import scrapy
from domzspider.items import Domz_Item
class DmozSpider(scrapy.Spider):
name = 'dmoz'
allowed_domains = ['http://dmoztools.net']
start_url = ['http://dmoztools.net/Computers/Open_Source/']
def parse(self, response):
sel = scrapy.selector.Selector(response)
# 使用xpath进行节点查找
sites = sel.xpath('//div id="site-list-content"/div class="site-item')
items = []
for site in sites:
item = Domz_Item()
item['title'] = site.xpath('//a target/text()').extract()
item['links'] = site.xpath('//a target/@href').extract()
item['desc'] = site.xpath('//div class="title-and-desc/div/text()').extract()
item.append(item)
return items
(不是很确定这个xpath的查找方式是否正确)
然后在cmd中,将爬到的内容保存为json格式:但是保存下来的json文件是0kb????
|
|