好的,出现这个错误是因为要选择的测试集文件数量大于现有的csv文件数量或者数量是负值。我们需要在代码中添加检查来避免这种情况发生。
以下是修改后的代码:
import os
import shutil
import random
from collections import defaultdict
def move_csv_files(source_dir, target_dir):
# 获取所有csv文件的路径
csv_files = []
for root, dirs, files in os.walk(source_dir):
for file in files:
if file.endswith(".csv"):
csv_files.append(os.path.join(root, file))
# 确保有足够的文件进行划分
if len(csv_files) < 2:
print("Not enough CSV files to create a test set.")
return
# 确定测试集的大小,这里假设是20%
num_test_files = max(2, int(0.2 * len(csv_files)))
if num_test_files > len(csv_files):
num_test_files = len(csv_files)
# 随机选择测试集文件
test_files = random.sample(csv_files, num_test_files)
# 按照源文件夹结构,将测试集文件移动到目标路径下的相应子文件夹
for file in test_files:
rel_path = os.path.relpath(file, source_dir)
target_path = os.path.join(target_dir, rel_path)
target_folder = os.path.dirname(target_path)
# 确保目标文件夹存在,如果不存在则创建
if not os.path.exists(target_folder):
os.makedirs(target_folder)
print(f"Created directory: {target_folder}")
# 移动文件
shutil.move(file, target_path)
print(f"Moved file {file} to {target_path}")
# 确保每个子文件夹至少有两个测试集文件
test_files_dict = defaultdict(list)
for file in test_files:
rel_path = os.path.relpath(file, source_dir)
folder = os.path.dirname(rel_path)
test_files_dict[folder].append(file)
for folder, files in test_files_dict.items():
if len(files) < 2:
print(f"Folder {folder} has less than 2 test files. Adding more files...")
additional_files = [file for file in csv_files if os.path.dirname(os.path.relpath(file, source_dir)) == folder and file not in files]
while len(files) < 2 and additional_files:
additional_file = additional_files.pop(0)
rel_path = os.path.relpath(additional_file, source_dir)
target_path = os.path.join(target_dir, rel_path)
shutil.move(additional_file, target_path)
files.append(additional_file)
print(f"Moved file {additional_file} to {target_path}")
# 示例用法:
source_directory = '/path/to/source_directory'
target_directory = '/path/to/target_directory'
move_csv_files(source_directory, target_directory)
主要改进点:
1. **检查文件数量**:在选择测试集文件之前,确保有足够的文件进行划分。
2. **修正num_test_files计算**:确保计算的测试集文件数量不会超过现有的csv文件数量。
请再次尝试运行这个代码,并确保源路径和目标路径设置正确。如果还有问题,请告诉我!
以上回复来自 FishC AI 小助理,如未能正确解答您的问题,请继续追问。 |