狗宁 发表于 2020-8-27 09:26:26

CrawlSpider爬图片的时候遇到了不完整url

这是部分代码
class BizhiSpider(CrawlSpider):
    name = 'bizhi'
    # allowed_domains = ['www.xxxx.com']
    start_urls = ['http://www.netbian.com/weimei/']
    start_link = LinkExtractor(allow='index_\d+?.htm')
    detail_link = LinkExtractor(allow='desk.+?\.htm')
    link = []
    detail=[]
    for each in start_link:
      each = 'http://www.netbian.com/' + each
      link.append(each)
    for url in detail_link:
      url = 'http://www.netbian.com/' + url
      detail.append(url)
   #对页码url进行解析
    rules = (
      Rule(link, callback='parse_item', follow=False),
      Rule(detail,callback=detail_parse,follow=True)
    )
    #页码解析函数
    def parse_item(self, response):
      pass
      # li_list = response.xpath('//*[@id="main"]/div/ul/li')
      # item = QuanzhanproItem()
      # for li in li_list:
      #   img_name = li.xpath('./a/@title').extract_first()
      #   item['img_name'] = img_name

    def detail_parse(self,response):
      src = response.xpath('//*[@id="main"]/div/div/p/a/img/@src').extract_first()
      item = QuanzhanproItem()
      item['src'] = src
      yield item



在用链接提取器后提取到了不完整的url,我以为可以迭代然后拼接,可是报错了,问一下有没有比较ok的解决办法??

狗宁 发表于 2020-8-27 09:59:08

解决了 ,自己想多了,这个LinkExtractor会自动解析当前主域名然后拼接,希望能帮到你们
页: [1]
查看完整版本: CrawlSpider爬图片的时候遇到了不完整url