【鱼币】python批量爬取k站（konachan）上的图片,Python交流,编程语言专区,鱼C论坛

小超努力学习中 发表于 2023-9-24 17:11:27

1111

白色浪花 发表于 2023-9-25 19:16:12

白色浪花 发表于 2023-9-25 19:17:04

Sunnysun. 发表于 2023-9-25 22:30:06

liuhongrun2022 发表于 2023-9-3 10:46
自占前排，求评分qwq

@学习编程中的Ben @歌者文明清理员 @Mike_python小 @陶远航 @zhangjinxuan @Ewan-A ...

白嫖

18305177067 发表于 2023-9-26 11:54:01

隐藏内容

360341024 发表于 2023-9-26 17:08:56

来取真经

sam_alphatop 发表于 2023-9-27 10:09:27

試試看

不愧是我 发表于 2023-9-27 15:21:39

lantest 发表于 2023-9-27 20:37:26

tql

harryhan123 发表于 2023-9-27 21:47:23

lantest 发表于 2023-9-27 23:32:23

import os
import requests
import urllib3
from bs4 import BeautifulSoup
import random
import string

"""
主要是批量获取konachan.net的图（斯哈斯哈）
没有做多线程
"""

headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
}

# 忽略 InsecureRequestWarning 警告
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

def getContent(url):
req = requests.get(url, headers=headers)
req.encoding = 'utf-8'
html = req.text
return html

def getAllImageContentUrls(html):
ImageContentUrls = []
soup = BeautifulSoup(html, 'html.parser')
thumb_links = soup.find_all('a', class_='thumb')
for link in thumb_links:
   href_value = link['href']
   imageContentUrl = "https" + "://konachan.net/" + href_value
   ImageContentUrls.append(imageContentUrl)
return ImageContentUrls

def getImageUrl(html):
soup = BeautifulSoup(html, 'html.parser')
imageUrl = soup.find('link', rel='image_src').get('href')
return imageUrl

def downloadImage(url,path):
response= requests.get(url, headers=headers)
if response.status_code == 200:
   characters = string.digits + string.ascii_letters
   random_code = ''.join(random.choice(characters) for _ in range(5))
   file_extension = os.path.splitext(url)
   fileName = random_code + file_extension
   save_path = path + fileName
   os.makedirs(os.path.dirname(save_path), exist_ok=True)
   with open(save_path, 'wb') as file:
         file.write(response.content)
   file.close()
   print("保存地址：{}".format(save_path))

def getImageUrls(ImageContentUrls):
imageUrls = []
for i in ImageContentUrls:
   html = getContent(i)
   imageUrl=getImageUrl(html)
   imageUrls.append(imageUrl)
return imageUrls

if __name__ == '__main__':

"""important"""
print("开始")
count = 0

#开始页面
startPage = 10
#结束页面
endPage = 11
#保存地址
path = "D:\\seseimage\\"

for i in range(startPage,endPage + 1):
   url = "https:" + "//konachan.net/post?page=" + str(i)
   imageContentUrls = getAllImageContentUrls(getContent(url))
   imageUrls = getImageUrls(imageContentUrls)
   for j in imageUrls:
         downloadImage(j,path)
         count = count + 1
print("下载完成，共{}张".format(count))

fengyu315 发表于 2023-9-28 09:38:17

源代码

360341024 发表于 2023-9-28 13:54:02

代码看懂了，关掉自己写又是一脸的懵逼

额外减小 发表于 2023-10-1 13:25:25

我看看，什么是k站

额外减小 发表于 2023-10-1 16:48:49

我要白嫖育碧

Wei-Yuanzhe 发表于 2023-10-3 10:04:33

{:10_257:}

yinda_peng 发表于 2023-10-18 23:21:14

lantest 发表于 2023-9-27 23:32

代码能跑但是下载0张

a905448839 发表于 2023-10-22 23:26:33

C#能爬虫嘛？

Ethan惊天 发表于 2023-10-23 11:30:50

鱼币

chen_1123 发表于 2023-11-23 15:55:25

+4 鱼币

页: 1 2 3 4 5 6 7 8 9 [10] 11

鱼C论坛's Archiver