|
发表于 2023-11-28 19:58:25
|
显示全部楼层
再改一改,我发现文件没有爬到下载地址,需要使用下面这个地址再对文件进行一次请求
- http://www.038909.xyz:5678/api/fs/get
复制代码
爬了两个多小时才爬到 "/体育/健身/【赛普健身专业视频】价值19800元/实用工具图表/体质判断对照图",^_^
然后服务器断开连接了,看起来是我的ip被封了,^_^
- #!/usr/bin/env python
- #coding=utf-8
- import requests
- import json
- import itertools
- import time
- from retry import retry
- import sys
- import logging
- logging.basicConfig(stream = sys.stderr, level = logging.WARNING)
- headers = {"User-Agent": "Mozilla/6.0 (X11; Linux x86_64; rv:109.0) Gecko/20120101 Firefox/139.0"}
- dir_url = 'http://www.038909.xyz:5678/api/fs/list'
- file_url = 'http://www.038909.xyz:5678/api/fs/get'
- @retry(delay = 5, logger = logging.getLogger())
- def read_json(path, url, json_):
- time.sleep(1)
- response = requests.post(url, json = json_, headers = headers)
- json_ = json.loads(response.text)
- if json_['code'] != 200: raise ValueError(json_['message'])
- return json_['data']
- def read_file(path):
- print(path, file = sys.stderr)
- json_ = {"path": path, "password": ""}
- return read_json(path, file_url, json_)
- def read_dir(path):
- print(path, file = sys.stderr)
- content = []
- for page in itertools.count(1):
- json_ = {"path": path, "password": "", "page": page, "per_page": 30, "refresh": False}
- data = read_json(path, dir_url, json_)
- if data['content'] != None: content.extend(data['content'])
- total = data['total']
- if total == len(content): break
- for i in range(len(content)):
- is_dir = content[i]['is_dir']
- name = content[i]['name']
- if is_dir: content[i]['dir_content'] = read_dir(path + name + '/')
- else: content[i] = read_file(path + name)
- return content
- #content = read_dir('/游戏/PC/07.运行库/')
- content = read_dir('/')
- print(content)
复制代码 |
|