|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 小菜鸟FLY 于 2021-8-16 17:18 编辑
- import urllib.request
- import re
- import os
- folder = 'photo_pa_chong_mm2'
- os.mkdir(folder)#创建当前目录
- os.chdir(folder)#修改当前文件夹,变为当前工作目录
- for num in range(17):
- if num>0:
- url = "https://www.jpxgmn.top/MiiTao/MiiTao13958_"+str(num)+".html"
- req = urllib.request.Request(url)
- req.add_header('User_Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3868.400 QQBrowser/10.8.4394.400')
- page = urllib.request.urlopen(req)
- html = page.read().decode('UTF-8')
- #/uploadfile/202005/4/4121555610.jpg
- p = r'/uploadfile/.+?jpg'
- name = re.findall(p,html)
- for i in name:
- site = "https://jp.plmn5.com"+str(i)
- req2 = urllib.request.Request(site)
- req2.add_header('User_Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3868.400 QQBrowser/10.8.4394.400')
- page2= urllib.request.urlopen(req2)
- photo = page2.read()
- filename = i.split("/")[-1]
- with open(filename,'wb') as f:
- f.write(photo)
复制代码代码美中不足的是对于不同页只能找规律去下载,用正则表达式能够找到地址,但是无法再次获得网页的源代码,大神来优化优化,直接从网址的第一页下载不同的小姐姐。 |
|