|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
spider 代码
- import scrapy
- from tutorial.items import DmozItem
- class DmozSpider(scrapy.Spider):
- name = "dmoz"
- allowed_domains = ["dmoz.org"]
- start_urls = [
- "http://dmoztools.net/Computers/Programming/Languages/Python/Books/",
- "http://dmoztools.net/Computers/Programming/Languages/Python/Resources/"
- ]
- def parse(self, response):
- sel = scrapy.selector.Selector(response)
- sites = sel.xpath('//div[@class="title-and-desc"]')
- items = []
- for site in sites:
- item = DmozItem()
- item['title'] = site.xpath('a/div/text()').extract()
- litem['link'] = site.xpath('a/@href').extract()
- item['desc'] = site.xpath('div[@class="site-descr "]/text()').extract()
- items.append(item)
- return items
复制代码
items代码
- import scrapy
- class DmozItem(scrapy.Item):
- # define the fields for your item here like:
- # name = scrapy.Field()
- title = scrapy.Field()
- link = scrapy.Field()
- desc = scrapy.Field()
复制代码 |
|