|
发表于 2018-7-14 19:17:08
|
显示全部楼层
本楼为最佳答案
 这个可以爬,你说的每天只能爬一张,我开始也遇到了这个问题,开始用beautifulsoup提取图片链接,怎么试都只能提取到第一个,后来用正则表达就可以提取整个网页的图片连接,这个代码的主函数没有修改,只能爬第一页的,因为第一页的地址和后面的规律不一样,楼主可以自己修改一下爬后面的页面的
- import re
- import requests
- from bs4 import BeautifulSoup as bs
- import os
- def url_open(url):
- headers = {
- 'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER'}
- res = requests.get(url, headers=headers)
- return res
- def findlinks(res):
- soup = bs(res.text, 'lxml')
- target = soup.find_all('ul', class_="clearfix")
- list = re.findall(r'href="(.*?\.html)"', str(target))
- return list
- def find_img(list):
- urlhead = 'http://pic.netbian.com'
- img = []
- for each in list:
- img.append(urlhead + each)
- return img
- def save_img(img):
- urlhead = 'http://pic.netbian.com'
- for each in img:
- res = url_open(each)
- soup = bs(res.content, 'lxml')
- link = re.findall(r'src="(/.*\.jpg)"', str(soup))
- url = urlhead + link[0]
- filename = url.split('/')[-1].replace('.html', '')
- img = url_open(url)
- with open(filename, 'wb') as f:
- f.write(img.content)
- if __name__ == '__main__':
- # os.mkdir('彼岸图')
- os.chdir('彼岸图')
- url = 'http://pic.netbian.com/4kyingshi/index.html'
- res = url_open(url)
- list = findlinks(res)
- img = find_img(list)
- save_img(img)
复制代码
彼岸图.zip
(826 Bytes, 下载次数: 13)
|
|