|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 zhouleiqiang 于 2020-9-3 12:56 编辑
各位大佬,今天学到scrapy框架那里,就是爬取网址标题、网址内容、网址的问题,运行之后出现 Spider error processing错误,我开始单独在命令行窗口都能调试成功的,转到idle写了之后就出现这个错误了 ,
2020-09-02 18:19:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://curlie.org/Computers/Pro ... s/Python/Resources/> (referer: None)
2020-09-02 18:19:35 [scrapy.core.scraper] ERROR: Spider error processing <GET https://curlie.org/Computers/Pro ... s/Python/Resources/> (referer: None)
代码如下。
import scrapy
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
'https://curlie.org/Computers/Programming/Languages/Python/Books/',
'https://curlie.org/Computers/Programming/Languages/Python/Resources/'
]
def parse(self,response):
sel = scrapy.slection.Selector(response)
sites = sel.xpath('//div[@class="title-and-desc"]')
for site in sites:
#网址标题
title = site.xpath('div[@class="site-title"]/a/text()').extract()
#href是网址
link = site.xpath('div[@class="site-title"]/a/@href').extract()
#text是标题内容
text = site.xpath('div[@class="site-descr"]/text()').extract()
print(title,link,desc)
|
|