Python批量爬取写真集,Python交流,编程语言专区,鱼C论坛

tzhang56 发表于 2020-3-11 00:51:32

Python批量爬取写真集

本帖最后由 tzhang56 于 2020-3-12 01:52 编辑

最近发现了一个资源丰富的写真集网站，写真集数量数以万计，想看的时候爬一些下来看看。图片本身的URL规律较强，因此不需要BeautifulSoup或正则表达式。
注意身体{:10_254:}
import requests
import os
n=int(input('起始写真编号：'))
amount = int(input('希望获取多少个部写真？\n:'))
location = "D://pics"
if not os.path.exists(location):
os.mkdir(location)             #在D盘新建文件夹pics
url_0 = "https://mtl.gzhuibei.com/images/img/"#写真集网址前半部分
while amount>=0:
url0 = url_0 + str(n+amount) +"/"    #写真集网址
for num in range(100):                #一个写真集里有很多图片，最多爬100张
   url = url0+str(num+1)+".jpg"    #写真图网址
   root = location+"//"+ url.split('/')[-2]+"//" #保存写真路径
   path = root + str(num+1)+".jpg"             #写真集内图片用序号命名
   try:
         if not os.path.exists(root):
            os.mkdir(root)                      #没有文件夹则创建新文件夹
         if not os.path.exists(path):
            r = requests.get(url)                #爬取网页
            if r.status_code == 200:             #判断图片是否存在，存在则保存
               with open(path, 'wb') as f:
                     f.write(r.content)
                     f.close()
                     print("file saved successfully!")
            else:                               #不存在则跳过
               pass
         else:
            print("file already exists!")
   except:
         print("failed!")
amount-=1

PYTHON大法牛逼 发表于 2020-3-11 09:08:52

牛逼前来膜拜

loveQQW 发表于 2020-3-11 09:28:03

666

xiangishi5 发表于 2020-3-11 10:48:16

666

jy02618370 发表于 2020-3-11 11:00:48

进来观摩下............

jy02618370 发表于 2020-3-11 11:04:30

Traceback (most recent call last):
File "C:/Users/Administrator/Desktop/11.py", line 1, in <module>
import requests
ModuleNotFoundError: No module named 'requests'
.....................哪里出错了

rongmexx 发表于 2020-3-11 11:12:14

333

sow007 发表于 2020-3-11 11:21:38

666

tzhang56 发表于 2020-3-11 11:42:51

jy02618370 发表于 2020-3-11 11:04
Traceback (most recent call last):
File "C:/Users/Administrator/Desktop/11.py", line 1, in
...

你没有安装requests模块，在cmd里输入 pip install requests就可以啦！requests模块爬虫必备

yeahwsw 发表于 2020-3-11 11:45:56

学习一下

Cashs 发表于 2020-3-11 11:46:13

{:5_109:}

伟大的王 发表于 2020-3-11 13:20:51

厉害

猫将军 发表于 2020-3-11 13:24:40

66666

我是混子 发表于 2020-3-11 14:24:20

进来膜拜大佬

SKY121 发表于 2020-3-11 14:54:05

66666

青门小浪花 发表于 2020-3-11 15:32:03

伸手

DavidCT 发表于 2020-3-11 15:52:12

这都可以，老司机啊

saz123 发表于 2020-3-11 16:07:13

牛逼前来膜拜

年少的梦想 发表于 2020-3-11 16:45:01

感谢，身体日渐消瘦，

shawnlei 发表于 2020-3-11 17:20:47

大佬怎么都是failed呀？我是小白

页: [1] 2 3 4 5 6 7 8 9 10

鱼C论坛's Archiver

Python批量爬取写真集