小超努力学习中
发表于 2023-9-24 17:11:27
1111
白色浪花
发表于 2023-9-25 19:16:12
66
白色浪花
发表于 2023-9-25 19:17:04
99
Sunnysun.
发表于 2023-9-25 22:30:06
liuhongrun2022 发表于 2023-9-3 10:46
自占前排,求评分qwq
@学习编程中的Ben @歌者文明清理员 @Mike_python小 @陶远航 @zhangjinxuan @Ewan-A ...
白嫖
18305177067
发表于 2023-9-26 11:54:01
隐藏内容
360341024
发表于 2023-9-26 17:08:56
来取真经
sam_alphatop
发表于 2023-9-27 10:09:27
試試看
不愧是我
发表于 2023-9-27 15:21:39
kk
lantest
发表于 2023-9-27 20:37:26
tql
harryhan123
发表于 2023-9-27 21:47:23
6
lantest
发表于 2023-9-27 23:32:23
import os
import requests
import urllib3
from bs4 import BeautifulSoup
import random
import string
"""
主要是批量获取konachan.net的图(斯哈斯哈)
没有做多线程
"""
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
}
# 忽略 InsecureRequestWarning 警告
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def getContent(url):
req = requests.get(url, headers=headers)
req.encoding = 'utf-8'
html = req.text
return html
def getAllImageContentUrls(html):
ImageContentUrls = []
soup = BeautifulSoup(html, 'html.parser')
thumb_links = soup.find_all('a', class_='thumb')
for link in thumb_links:
href_value = link['href']
imageContentUrl = "https" + "://konachan.net/" + href_value
ImageContentUrls.append(imageContentUrl)
return ImageContentUrls
def getImageUrl(html):
soup = BeautifulSoup(html, 'html.parser')
imageUrl = soup.find('link', rel='image_src').get('href')
return imageUrl
def downloadImage(url,path):
response= requests.get(url, headers=headers)
if response.status_code == 200:
characters = string.digits + string.ascii_letters
random_code = ''.join(random.choice(characters) for _ in range(5))
file_extension = os.path.splitext(url)
fileName = random_code + file_extension
save_path = path + fileName
os.makedirs(os.path.dirname(save_path), exist_ok=True)
with open(save_path, 'wb') as file:
file.write(response.content)
file.close()
print("保存地址:{}".format(save_path))
def getImageUrls(ImageContentUrls):
imageUrls = []
for i in ImageContentUrls:
html = getContent(i)
imageUrl=getImageUrl(html)
imageUrls.append(imageUrl)
return imageUrls
if __name__ == '__main__':
"""important"""
print("开始")
count = 0
#开始页面
startPage = 10
#结束页面
endPage = 11
#保存地址
path = "D:\\seseimage\\"
for i in range(startPage,endPage + 1):
url = "https:" + "//konachan.net/post?page=" + str(i)
imageContentUrls = getAllImageContentUrls(getContent(url))
imageUrls = getImageUrls(imageContentUrls)
for j in imageUrls:
downloadImage(j,path)
count = count + 1
print("下载完成,共{}张".format(count))
fengyu315
发表于 2023-9-28 09:38:17
源代码
360341024
发表于 2023-9-28 13:54:02
代码看懂了,关掉自己写又是一脸的懵逼
额外减小
发表于 2023-10-1 13:25:25
我看看,什么是k站
额外减小
发表于 2023-10-1 16:48:49
我要白嫖育碧
Wei-Yuanzhe
发表于 2023-10-3 10:04:33
{:10_257:}
yinda_peng
发表于 2023-10-18 23:21:14
lantest 发表于 2023-9-27 23:32
代码能跑但是下载0张
a905448839
发表于 2023-10-22 23:26:33
C#能爬虫嘛?
Ethan惊天
发表于 2023-10-23 11:30:50
鱼币
chen_1123
发表于 2023-11-23 15:55:25
+4 鱼币
页:
1
2
3
4
5
6
7
8
9
[10]
11