|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 holistic杀手 于 2018-1-7 10:00 编辑
- import scrapy
- class DmozSpider(scrapy.Spider):
- name = "dmoz"
- allowed_domains = ['http://dmoztools.net/']#爬取的范围
- start_urls = [
- 'http://dmoztools.net/Computers/Programming/Languages/Python/Books/',
- 'http://dmoztools.net/Computers/Programming/Languages/Python/Resources/'
- ]
- def parse(self,response):
- sel = scrapy.selector.Selector(response)
- sites = sel.xpath('//div[@class="title-and-desc"]/a')
- for site in sites:
- title = site.xpath('div/text()').extract()
- link = site.xpath('@href').extract()
- desc = site.xpath('text()').extract()
- print(title,link,desc)
复制代码
抓标题的标签为什么前面还要加div?
desc是什么? |
|