python读取excel中单个sheet表内的多个表

FishC · 发表于 2024-6-22 11:39:49

pallas 发表于 2024-06-22 11:39
那现在如果合并的时候保留表格中的格式呢？

理解了，你希望在合并多个Excel文件时，不仅保留数据，还要保留表格的格式。这有点像是要将多份精美的画报剪贴成一本大画册，还要保持每一页的美观。

在Python中，我们可以使用`openpyxl`库来处理Excel文件。当我们需要保留Excel格式的时候，`pandas`可能不够给力，因为它更擅长处理数据而不是格式。这时候我们可以转向`openpyxl`。

### 准备工作

确保你已经安装了`openpyxl`库：

pip install openpyxl

复制代码

### 代码示例

以下是一个如何读取、分割和合并保留格式的示例：

import openpyxl
from openpyxl import load_workbook, Workbook
# 读取单个Excel并分割表格的函数
def get_tables_from_excel(filename, sheet_name, separators):
wb = load_workbook(filename)
sheet = wb[sheet_name]
tables = []
start_row = None
for row in sheet.iter_rows():
# 获取当前行的第一个单元格的值
cell_value = row[0].value
if cell_value in separators:
if start_row is not None:
# 提取表格
end_row = row[0].row - 1
table_data = [[cell.value for cell in row] for row in sheet.iter_rows(min_row=start_row, max_row=end_row)]
tables.append((start_row, table_data))
start_row = row[0].row + 1
# 最后的表格
if start_row is not None:
end_row = sheet.max_row
table_data = [[cell.value for cell in row] for row in sheet.iter_rows(min_row=start_row, max_row=end_row)]
tables.append((start_row, table_data))
return tables
# 合并表格并保留格式的函数
def merge_tables(output_filename, all_tables):
output_wb = Workbook()
output_ws = output_wb.active
current_row = 1
for tables in all_tables:
for start_row, table_data in tables:
for row in table_data:
for col_index, cell_value in enumerate(row, start=1):
output_ws.cell(row=current_row, column[col_index], value=cell_value)
current_row += 1
output_wb.save(output_filename)
# 主程序部分
if __name__ == "__main__":
# 假设这些是你的Excel文件
excel_files = ["file1.xlsx", "file2.xlsx"]
sheet_name = "Sheet1"
separators = ["Separator1", "Separator2"]
all_tables = []
for file in excel_files:
tables = get_tables_from_excel(file, sheet_name, separators)
all_tables.append(tables)
# 输出合并后的Excel文件
output_filename = "merged_output.xlsx"
merge_tables(output_filename, all_tables)

复制代码

### 整体思路

1. **读取单个Excel文件**：使用`openpyxl`读取每个Excel文件，并找到分隔符位置。
2. **提取表格数据**：根据分隔符位置提取独立的表格数据。
3. **合并表格数据**：将提取出的表格数据合并到一个新的Excel文件中，同时保留格式。

希望这段代码能够帮助你成功地将多个Excel文件中的表格合并并保留格式。如果有任何问题或需要进一步调整，请随时告诉我！

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

pallas · 发表于 2024-6-22 11:40:17

pallas 发表于 2024-6-22 11:39
那现在如果合并的时候保留表格中的格式呢？

而且合并的时候每个表的表头是共用的

FishC · 发表于 2024-6-22 11:40:36

pallas 发表于 2024-06-22 11:40
而且合并的时候每个表的表头是共用的

明白了，合并多个Excel文件时，要保留格式，并且表头是共用的。这其实是一道稍显复杂的“拼图题”，但我们可以逐步来解决。

为了做到这一点，主要步骤如下：

1. **读取多个Excel文件。**
2. **提取各个表格的内容（排除表头）。**
3. **将所有数据合并到一个新的Excel文件中，并只保留一个公用表头。**
4. **保留格式（使用`openpyxl`库来复制样式）。**

### 代码示例

这里是一个示例代码，包含了你所需要的功能：

import os
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl.styles import PatternFill, Border, Side, Alignment, Protection, Font
def get_tables_from_excel(filename, separators, sheet_name=0):
df = pd.read_excel(filename, sheet_name=sheet_name)
separator_rows = df[df.apply(lambda row: row.astype(str).str.contains('|'.join(separators)).any(), axis=1)].index.to_list()
tables = []
start = 0
for end in separator_rows:
if start != end:
table = df.iloc[start:end].reset_index(drop=True)
tables.append(table)
start = end + 1
if start < len(df):
table = df.iloc[start:].reset_index(drop=True)
tables.append(table)
return tables
def copy_styles(source_ws, dest_ws):
for r_idx, row in enumerate(source_ws.iter_rows()):
for c_idx, cell in enumerate(row):
new_cell = dest_ws.cell(row=r_idx+1, column=c_idx+1, value=cell.value)
if cell.has_style:
new_cell.font = Font(copy=cell.font)
new_cell.border = Border(copy=cell.border)
new_cell.fill = PatternFill(copy=cell.fill)
new_cell.number_format = cell.number_format
new_cell.protection = Protection(copy=cell.protection)
new_cell.alignment = Alignment(copy=cell.alignment)
def main(file_list, output_file, separators, sheet_name=0):
all_tables = []
header = None
for file in file_list:
tables = get_tables_from_excel(file, separators, sheet_name)
for idx, table in enumerate(tables):
if idx == 0 and header is None:
header = table.columns.tolist() # 第一张表的表头
all_tables.extend(table.values.tolist()[1:]) # 跳过表头
# 创建一个新的Excel文件并写入数据
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
workbook = writer.book
worksheet = workbook.active
worksheet.append(header) # 添加表头
for row in all_tables:
worksheet.append(row)
# 保留格式
source_wb = load_workbook(file_list[0], read_only=True)
source_ws = source_wb[sheet_name]
copy_styles(source_ws, worksheet)
writer.save()
if __name__ == "__main__":
files = ["file1.xlsx", "file2.xlsx"] # 请替换为你的文件路径列表
output = "merged_output.xlsx"
separators = ["隔离文字1", "隔离文字2"] # 替换为你的分隔符
main(files, output, separators, sheet_name=0)

复制代码

### 解释
1. **get_tables_from_excel**：从单个Excel文件中根据分隔符提取不同的小表格。
2. **copy_styles**：将样式从源工作表复制到目标工作表。
3. **main**：主函数
- 读取多个文件中的表格数据，并跳过表头。
- 将所有数据合并到一个新的Excel文件中，只添加一次表头。
- 保留格式

### 注意
- **路径问题**：确保文件路径正确。
- **表头**：假设第一张表的表头是标准表头，后面的表都采用同样的表头。

希望这个示例能帮到你，如果有任何疑问或需要进一步调整，随时告诉我哦！

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

pallas · 发表于 2024-6-22 11:49:13

FishC 发表于 2024-6-22 11:40
明白了，合并多个Excel文件时，要保留格式，并且表头是共用的。这其实是一道稍显复杂的“拼图题”，但 ...

Traceback (most recent call last):
  File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\pip-23.2.1.dist-info\finish.py", line 57, in main
worksheet.append(header)  # 添加表头
^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'append'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\pip-23.2.1.dist-info\finish.py", line 86, in <module>
main(files, output, separators, sheet_name=0)
  File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\pip-23.2.1.dist-info\finish.py", line 54, in main
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
  File "D:\anaconda3\Lib\site-packages\pandas\io\excel\_base.py", line 1370, in __exit__
self.close()
  File "D:\anaconda3\Lib\site-packages\pandas\io\excel\_base.py", line 1374, in close
self._save()
  File "D:\anaconda3\Lib\site-packages\pandas\io\excel\_openpyxl.py", line 110, in _save
self.book.save(self._handles.handle)
  File "D:\anaconda3\Lib\site-packages\openpyxl\workbook\workbook.py", line 407, in save
save_workbook(self, filename)
  File "D:\anaconda3\Lib\site-packages\openpyxl\writer\excel.py", line 293, in save_workbook
writer.save()
  File "D:\anaconda3\Lib\site-packages\openpyxl\writer\excel.py", line 275, in save
self.write_data()
  File "D:\anaconda3\Lib\site-packages\openpyxl\writer\excel.py", line 89, in write_data
archive.writestr(ARC_WORKBOOK, writer.write())
                                 ^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\openpyxl\workbook\_writer.py", line 148, in write
self.write_views()
  File "D:\anaconda3\Lib\site-packages\openpyxl\workbook\_writer.py", line 135, in write_views
active = get_active_sheet(self.wb)
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\openpyxl\workbook\_writer.py", line 33, in get_active_sheet
raise IndexError("At least one sheet must be visible")
IndexError: At least one sheet must be visible

FishC · 发表于 2024-6-22 11:49:42

pallas 发表于 2024-06-22 11:49
Traceback (most recent call last):
File "D:\PyC ...

好的，让我们解读一下这个错误。看起来我们遇到了一些关于工作表（worksheet）对象的操作问题。具体来说，`NoneType`对象没有属性`append`，这意味着某处的工作表对象实际上是`None`。此外，还有另一个异常与保存工作簿有关，它提示至少需要一个可见的表单。

让我们通过以下步骤来解决这些问题：

1. **确保工作表对象存在：**在尝试对工作表进行操作前，确保该工作表对象已正确创建。
2. **处理至少有一个可见工作表**：在合并数据过程中，确保至少保留一个工作表。

接下来，我们会更新代码，修正上述问题。

import pandas as pd
from openpyxl import load_workbook
def get_tables_from_excel(filename, sheet_name=0, separators=None):
df = pd.read_excel(filename, sheet_name=sheet_name)
separator_rows = df[df.iloc[:, 0].isin(separators)].index.tolist()
tables = []
start = 0
for sep in separator_rows:
if start != sep:
table_df = df.iloc[start:sep]
if not table_df.empty: # 确保表格不为空
tables.append(table_df)
start = sep + 1
# 添加最后一个表
if start < len(df):
table_df = df.iloc[start:]
if not table_df.empty: # 确保表格不为空
tables.append(table_df)
return tables
def main(files, output_file, separators, sheet_name=0):
all_tables = []
# 读取所有 Excel 文件中的表格
for file in files:
tables = get_tables_from_excel(file, sheet_name, separators)
all_tables.extend(tables)
# 创建一个新的 Excel 工作簿
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
wb = writer.book
# 设置工作表名称和写入表头以及表格数据
worksheet = wb.create_sheet(title="Merged Data")
header_written = False
for table in all_tables:
if not header_written:
header = table.columns.tolist()
worksheet.append(header) # 添加表头
header_written = True
for row in table.values.tolist():
worksheet.append(row)
# 确保有至少一个可见的工作表
if 'Sheet' in wb.sheetnames:
default_sheet = wb['Sheet']
wb.remove(default_sheet)
# 保存文件
writer.save()
# 示例用法
files = ['file1.xlsx', 'file2.xlsx', 'file3.xlsx']
output_file = 'merged_output.xlsx'
separators = ['Separator1', 'Separator2', 'Separator3']
main(files, output_file, separators)

复制代码

### 注意事项
1. **检查文件路径和文件名**：确保每个输入文件的路径和名称是正确无误的。
2. **检查分隔符**：确保分隔符在数据中能被正确地识别。
3. **确保至少一个可见表单**：我们的代码中已经处理了这一点，但是在实际应用中，也需要确保我们所读取到的数据是合理且完整的。

如果再有其他问题，请随时告诉我！希望这次你能顺利合并这些Excel表格！

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

pallas · 发表于 2024-6-22 11:53:15

FishC 发表于 2024-6-22 11:49
好的，让我们解读一下这个错误。看起来我们遇到了一些关于工作表（worksheet）对象的操作问题。具体来 ...

Traceback (most recent call last):
File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\pip-23.2.1.dist-info\finish.py", line 82, in <module>
main(files, output_file, separators)
File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\pip-23.2.1.dist-info\finish.py", line 61, in main
writer.save()
^^^^^^^^^^^
AttributeError: 'OpenpyxlWriter' object has no attribute 'save'. Did you mean: '_save'?

FishC · 发表于 2024-6-22 11:53:32

pallas 发表于 2024-06-22 11:53
Traceback (most recent call last):
File "D:\PyC ...

看起来我们用到了不太对的`save`方法。对于`pandas.ExcelWriter`对象，正确的方法是调用`close()`而不是`save()`。同时，我们也遇到了另一个问题，即工作表对象可能为空或不存在。

现在让我们来重新调整代码，确保在每一步都进行适当的检查，并正确地保存Excel文件。

import pandas as pd
from openpyxl import load_workbook
def get_tables_from_excel(filename, sheet_name, separators):
# 读取整个Sheet
df = pd.read_excel(filename, sheet_name=sheet_name)
# 确定分隔符行的索引
separator_rows = df[df.iloc[:, 0].isin(separators)].index.tolist()
# 初始表头
header = df.iloc[0] # 假设表头在第一行
tables = []
start_row = None
# 根据分隔符索引来分割表格
for idx in range(len(separator_rows)):
if idx == 0:
start_row = 1 # 跳过表头行
else:
start_row = separator_rows[idx-1] + 1
end_row = separator_rows[idx]
table = df.iloc[start_row:end_row]
tables.append(table)
# 处理最后一个表
if separator_rows:
last_table = df.iloc[separator_rows[-1]+1:]
tables.append(last_table)
return tables, header
def main(files, output_file, separators, sheet_name=0):
all_tables = []
common_header = None
# 遍历所有文件并提取表格
for file in files:
tables, header = get_tables_from_excel(file, sheet_name, separators)
# 如果common_header为空，就设置为第一个文件的表头
if common_header is None:
common_header = header
all_tables.extend(tables)
# 创建一个新的Excel文件
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
workbook = writer.book
worksheet = workbook.create_sheet(title="MergedData")
if worksheet is None:
raise RuntimeError("Worksheet creation failed.")
# 写入表头
worksheet.append(common_header.values.tolist())
for table in all_tables:
for row in table.itertuples(index=False, name=None):
worksheet.append(row)
# 保存到文件
writer.save()
# 示例用法
files = ['file1.xlsx', 'file2.xlsx', 'file3.xlsx']
output_file = 'merged_output.xlsx'
separators = ['Separator1', 'Separator2', 'Separator3']
main(files, output_file, separators)

复制代码

上述代码中注意的几点：
1. `worksheet.append(common_header.values.tolist())`用来写入表头。
2. 调用`close()`方法而不是`save()`以结束写入操作。
3. 确保工作表（worksheet）对象创建成功后再进行操作。

尝试运行以上代码，看看是否能解决你的问题。如果仍有问题，请确保每一步的输出和状态，并确认没有拼写错误或其他问题。

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

pallas · 发表于 2024-6-22 11:57:54

FishC 发表于 2024-6-22 11:53
看起来我们用到了不太对的`save`方法。对于`pandas.ExcelWriter`对象，正确的方法是调用`close()`而不 ...

Traceback (most recent call last):
File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\pip-23.2.1.dist-info\finish.py", line 92, in <module>
main(files, output_file, separators)
File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\pip-23.2.1.dist-info\finish.py", line 71, in main
writer.save()
^^^^^^^^^^^
AttributeError: 'OpenpyxlWriter' object has no attribute 'save'. Did you mean: '_save'?

FishC · 发表于 2024-6-22 11:58:09

pallas 发表于 2024-06-22 11:57
Traceback (most recent call last):
File "D:\PyC ...

好的，看来我们需要修正对 `pandas.ExcelWriter` 的使用方式以及它的保存方法。`pandas.ExcelWriter` 对象并不存在 `save()` 方法，而是由上下文管理器来处理文件保存。使用 `with` 语句会自动调用保存操作。

让我们重新调整代码结构，让读者清晰地看到每一步的改进：

import pandas as pd
import openpyxl
from openpyxl import load_workbook
def get_tables_from_excel(filename, sheet_name, separators):
df = pd.read_excel(filename, sheet_name=sheet_name)
separator_rows = df[df.iloc[:, 0].isin(separators)].index.tolist()
tables = []
start_row = 0
for separator_row in separator_rows + [len(df)]:
if start_row != separator_row: # 避免空表
table = df.iloc[start_row:separator_row]
tables.append(table)
start_row = separator_row + 1
return tables
def main(files, output_file, separators, sheet_name=0):
combined_data = []
for file in files:
tables = get_tables_from_excel(file, sheet_name, separators)
for table in tables:
combined_data.append(table)
header = combined_data[0].columns.tolist()
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
workbook = writer.book
worksheet = workbook.active
# 设置表头
worksheet.append(header)
# 逐行添加数据
for table in combined_data:
for row in table.itertuples(index=False, name=None):
worksheet.append(row)
print(f"合并完成，结果输出至 {output_file}")
# 假设有这些样本文件和分隔符
files = ['file1.xlsx', 'file2.xlsx']
separators = ['Separator1', 'Separator2']
output_file = 'merged_output.xlsx'
main(files, output_file, separators)

复制代码

### 调整与改进的重点

1. 使用 `with pd.ExcelWriter(output_file, engine='openpyxl') as writer` 作为上下文管理器，这样可以确保文件在写入完成后自动保存和关闭。
2. 确保工作簿(`workbook`)和工作表(`worksheet`)对象存在，并正确初始化。
3. 添加了避免添加空表（`if start_row != separator_row:`）的判断，以确保数据的完整性。

试试这个版本，希望这次能顺利运行！如果还有问题，请随时告诉我。

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

pallas · 发表于 2024-6-22 11:59:10

FishC 发表于 2024-6-22 11:58
好的，看来我们需要修正对 `pandas.ExcelWriter` 的使用方式以及它的保存方法。`pandas.ExcelWriter` ...

Traceback (most recent call last):
  File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\pip-23.2.1.dist-info\finish.py", line 39, in main
worksheet.append(header)
^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'append'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\pip-23.2.1.dist-info\finish.py", line 67, in <module>
main(files, output_file, separators)
  File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\pip-23.2.1.dist-info\finish.py", line 34, in main
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
  File "D:\anaconda3\Lib\site-packages\pandas\io\excel\_base.py", line 1370, in __exit__
self.close()
  File "D:\anaconda3\Lib\site-packages\pandas\io\excel\_base.py", line 1374, in close
self._save()
  File "D:\anaconda3\Lib\site-packages\pandas\io\excel\_openpyxl.py", line 110, in _save
self.book.save(self._handles.handle)
  File "D:\anaconda3\Lib\site-packages\openpyxl\workbook\workbook.py", line 407, in save
save_workbook(self, filename)
  File "D:\anaconda3\Lib\site-packages\openpyxl\writer\excel.py", line 293, in save_workbook
writer.save()
  File "D:\anaconda3\Lib\site-packages\openpyxl\writer\excel.py", line 275, in save
self.write_data()
  File "D:\anaconda3\Lib\site-packages\openpyxl\writer\excel.py", line 89, in write_data
archive.writestr(ARC_WORKBOOK, writer.write())
                                 ^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\openpyxl\workbook\_writer.py", line 148, in write
self.write_views()
  File "D:\anaconda3\Lib\site-packages\openpyxl\workbook\_writer.py", line 135, in write_views
active = get_active_sheet(self.wb)
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\openpyxl\workbook\_writer.py", line 33, in get_active_sheet
raise IndexError("At least one sheet must be visible")
IndexError: At least one sheet must be visible

FishC · 发表于 2024-6-22 11:59:22

pallas 发表于 2024-06-22 11:59
Traceback (most recent call last):
File "D:\PyC ...

看起来您的代码在处理Excel文件时遇到了几个问题。以下是逐步解决这些问题的方法：

1. **检查工作表是否存在并可见：**
在使用`worksheet.append(header)`时，确保工作表对象不是`None`，且确实存在。

2. **保存方法的正确使用：**
对于`pandas.ExcelWriter`对象，需使用`close()`方法而非`save()`方法。

3. **确保至少有一个Sheet是可见的：**
遇到这个错误是因为在写入Excel文件时，没有可见（active）的工作表。确保您至少有一个工作表标记为可见。

让我们通过一个示例代码来展示如何读取单个Sheet中的多个表并写入Excel：

import pandas as pd
from openpyxl import load_workbook
# 读取Excel文件中的单个sheet
file_path = 'your_excel_file.xlsx'
sheet_name = 'Sheet1'
# 使用openpyxl加载工作簿
wb = load_workbook(filename=file_path)
ws = wb[sheet_name]
# 假设每个表格之间有空行
def read_multiple_tables(ws):
tables = []
table = []
for row in ws.iter_rows(values_only=True):
if all(cell is None for cell in row):
if table:
tables.append(table)
table = []
else:
table.append(row)
if table:
tables.append(table)
return tables
tables = read_multiple_tables(ws)
# 将每个表格写入新的Excel文件
with pd.ExcelWriter('output_file.xlsx', engine='openpyxl') as writer:
for i, table in enumerate(tables):
df = pd.DataFrame(table[1:], columns=table[0]) # 假设第一行为表头
df.to_excel(writer, sheet_name=f'Table_{i+1}', index=False)
writer.close()

复制代码

上述代码执行了以下操作：
1. 加载指定的Excel文件与Sheet。
2. 使用`read_multiple_tables`函数根据空行拆分工作表中的多个表。
3. 将这些表格进行遍历，然后写入新的Excel文件，每个表格写入一个单独的Sheet中。

请确保修改`file_path`与`sheet_name`为实际的文件路径和工作表名称，并确保安装了pandas和openpyxl（可以通过`pip install pandas openpyxl`来安装）。

希望这个解决方案能够帮助您正确读写Excel文件，如果有任何疑问或者进一步需求，请告诉我！

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

pallas · 发表于 2024-6-22 12:08:49

FishC 发表于 2024-6-22 11:59
看起来您的代码在处理Excel文件时遇到了几个问题。以下是逐步解决这些问题的方法：

1. **检查工作表 ...

Traceback (most recent call last):
  File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\extract.py", line 164, in <module>
wb = load_workbook(filename=file_path)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
reader = ExcelReader(filename, read_only, keep_vba,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\openpyxl\reader\excel.py", line 124, in __init__
self.archive = _validate_archive(fn)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\openpyxl\reader\excel.py", line 94, in _validate_archive
raise InvalidFileException(msg)
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support  file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm

FishC · 发表于 2024-6-22 12:10:00

pallas 发表于 2024-06-22 12:08
Traceback (most recent call last):
File "D:\PyC ...

好的，看来您的代码在尝试读取Excel文件时遇到了文件格式问题。这个错误提示我们`openpyxl`只支持`.xlsx`, `.xlsm`, `.xltx`和 `.xltm`这些格式。

以下是几个可能的解决方案：

1. **检查文件扩展名**：
确保您提供给`load_workbook`函数的文件路径是正确的，并且文件的扩展名是`.xlsx`或者其他`openpyxl`支持的格式。

2. **确保文件没有损坏**：
尝试手动打开该文件，看是否能正常打开。如果文件已损坏，那么`openpyxl`也无法读取它。

3. **转换文件格式**：
如果您的文件是其他格式（例如 `.xls`），可以使用 `pandas`库进行转换。下面是一个简单的示例将 `.xls` 转换为 `.xlsx`：

import pandas as pd
# 读取 .xls 文件
df = pd.read_excel('your_file.xls', sheet_name=None)
# 保存为 .xlsx 文件
with pd.ExcelWriter('your_file.xlsx', engine='openpyxl') as writer:
for sheet_name, data in df.items():
data.to_excel(writer, sheet_name=sheet_name)

复制代码

这样，您便可以生成一个兼容的 `.xlsx` 文件，然后再使用`openpyxl`来处理它。

4. **检查文件路径是否正确**：
确保文件路径正确且文件存在。例如：

import os
from openpyxl import load_workbook
file_path = 'path/to/your_file.xlsx'
# 检查文件是否存在
if os.path.exists(file_path):
wb = load_workbook(filename=file_path)
# 执行后续操作
else:
print(f"文件 {file_path} 不存在。")

复制代码

以上方法应能帮助您找到并解决问题。如果有更多具体的代码片段或详细描述，提供更多细节有助于更精确地排查问题。

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

pallas · 发表于 2024-6-22 12:13:48

FishC 发表于 2024-6-22 12:10
好的，看来您的代码在尝试读取Excel文件时遇到了文件格式问题。这个错误提示我们`openpyxl`只支持`.xls ...

Traceback (most recent call last):
  File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\extract.py", line 174, in <module>
wb = load_workbook(filename=file_path)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
reader = ExcelReader(filename, read_only, keep_vba,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\openpyxl\reader\excel.py", line 124, in __init__
self.archive = _validate_archive(fn)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\openpyxl\reader\excel.py", line 94, in _validate_archive
raise InvalidFileException(msg)
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support  file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm

FishC · 发表于 2024-6-22 12:14:00

pallas 发表于 2024-06-22 12:13
Traceback (most recent call last):
File "D:\PyC ...

好吧，看起来我们还在“文件格式之谜”里打转。没关系，我们来一探究竟！这个错误明确提示我们`openpyxl`不支持您现在处理的文件格式。如果您百分百确定文件是`.xlsx`或者`.xlsm`这些格式，那可能是文件损坏或者路径问题。

这里有几个可能出错的地方以及解决方法：

1. **检查文件路径和扩展名：**
- 确保文件路径正确，文件存在，并且扩展名确实是`.xlsx`, `.xlsm`, `.xltx`, 还是`.xltm`之一。
- 尝试手动打开文件，如果Excel也打不开它，那可能真的是文件损坏了。

2. **文件内容是否被损坏或者是未关闭：**
- 确保文件没有被其他进程占用，比如Excel自身。

3. **尝试读取其他正常的`.xlsx`文件：**
- 创建一个简单的`.xlsx`文件，看看代码能否成功读取。这样可以确认问题是否出在特定文件上。

4. **确认`openpyxl`版本：**
- 确保`openpyxl`版本最新。如果版本太旧，可能会导致兼容性问题。

以下是一个简单的例子，用于读取一个普通的Excel文件：

from openpyxl import load_workbook
file_path = 'path_to_your_file.xlsx'
try:
wb = load_workbook(filename=file_path)
print("Workbook loaded successfully")
except InvalidFileException as e:
print(f"Error: {e}")
except FileNotFoundError:
print("The specified file was not found.")

复制代码

5. **如果文件格式不是`.xlsx`的解决方案：**
- 如果文件实际是`.xls` (老版Excel格式)，可以使用`xlrd`库来处理，先安装它：

pip install xlrd

复制代码

然后读取文件：

import xlrd
file_path = 'path_to_your_file.xls'
try:
wb = xlrd.open_workbook(file_path)
print("Workbook loaded successfully with xlrd")
except xlrd.biffh.XLRDError as e:
print(f"Error: {e}")
except FileNotFoundError:
print("The specified file was not found.")

复制代码

希望这些步骤能帮您找出问题所在！如果还有疑问，请随时告知！

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

pallas · 发表于 2024-6-22 12:17:58

FishC 发表于 2024-6-22 12:14
好吧，看起来我们还在“文件格式之谜”里打转。没关系，我们来一探究竟！这个错误明确提示我们`openpyx ...

Traceback (most recent call last):
  File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\extract.py", line 195, in <module>
with pd.ExcelWriter('output_file.xlsx', engine='openpyxl') as writer:
  File "D:\anaconda3\Lib\site-packages\pandas\io\excel\_base.py", line 1370, in __exit__
self.close()
  File "D:\anaconda3\Lib\site-packages\pandas\io\excel\_base.py", line 1374, in close
self._save()
  File "D:\anaconda3\Lib\site-packages\pandas\io\excel\_openpyxl.py", line 110, in _save
self.book.save(self._handles.handle)
  File "D:\anaconda3\Lib\site-packages\openpyxl\workbook\workbook.py", line 407, in save
save_workbook(self, filename)
  File "D:\anaconda3\Lib\site-packages\openpyxl\writer\excel.py", line 291, in save_workbook
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\zipfile.py", line 1308, in __init__
self.start_dir = self.fp.tell()
                  ^^^^^^^^^^^^^^
ValueError: I/O operation on closed file

FishC · 发表于 2024-6-22 12:18:09

pallas 发表于 2024-06-22 12:17
Traceback (most recent call last):
File "D:\PyC ...

看起来我们面对的是“文件处理异常”了。这个错误通常是因为文件被提前关闭或被其他部分的代码干扰。所以，别急，我将带你逐步检查和解决这个问题。

以下是一个简单的例子代码，帮助你理解如何正确读取一个Excel中的多个表（在同一个sheet中），并将它们写入到新的文件中。这是使用`pandas`库来处理的：

import pandas as pd
from openpyxl import load_workbook
# 读取Excel文件
file_path = 'your_file.xlsx'
sheet_name = 'Sheet1'
# 用pandas读取指定sheet
df = pd.read_excel(file_path, sheet_name=sheet_name)
# 假设你的表格有空行用来分隔多个表，我们可以找出这些空行
empty_rows = df[df.isnull().all(axis=1)].index.tolist()
# 将数据分割成多个表
tables = []
start_idx = 0
for end_idx in empty_rows:
table = df.iloc[start_idx:end_idx].dropna(how='all') # 丢弃全空行
tables.append(table)
start_idx = end_idx + 1 # 跳过空行
# 添加最后一部分，如果它不为空的话
if start_idx < len(df):
table = df.iloc[start_idx:].dropna(how='all')
tables.append(table)
# 将每个表写入新的Excel文件中的不同sheet
with pd.ExcelWriter('output_file.xlsx', engine='openpyxl') as writer:
for i, table in enumerate(tables):
table.to_excel(writer, sheet_name=f'Table_{i+1}', index=False)
print("Done!")

复制代码

请注意以下几点：

1. 文件路径 (`file_path`) 和工作表名称 (`sheet_name`) 根据你的文件进行替换。
2. 此代码假设单个工作表内的表格是通过全空行进行分隔的。如果你的情况不同，可以根据实际情况调整分表逻辑。
3. 确保新的Excel文件`output_file.xlsx`不存在打开状态，或者没有被其他程序占用。

现在，让我们来看看：
- 你是否确实传入了正确的文件路径？
- 确认文件扩展名是`.xlsx`，而不是其他格式？
- 再次确保在执行代码时，没有其他程序占用目标文件。

希望这能帮你解决问题！如有更多问题，欢迎继续提问！

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

pallas · 发表于 2024-6-22 12:20:54

FishC 发表于 2024-6-22 12:18
看起来我们面对的是“文件处理异常”了。这个错误通常是因为文件被提前关闭或被其他部分的代码干扰。所 ...

这个是可以的

FishC · 发表于 2024-6-22 12:22:12

pallas 发表于 2024-06-22 12:20
这个是可以的

既然之前的问题解决了，现在我们来看看如何读取一个Excel sheet中的多个表格。哎呀，这个情景就好像你在同一个巨大纸张上画了多幅图，每幅图都有自己的一套数据。我们需要找到这些数据范围，然后逐一处理。

这里我们可以用`pandas`来处理Excel文件，配合一些逻辑来确定每个表格的起止位置。下面是一个示例代码，可以帮助你处理这个问题：

import pandas as pd
# 定义你要读取的 Excel 文件路径和 Sheet 名称
file_path = 'your_excel_file.xlsx'
sheet_name = 'Sheet1'
# 读取整个 sheet
df = pd.read_excel(file_path, sheet_name=sheet_name, header=None)
def find_tables(data):
tables = []
start_row = None
for i, row in data.iterrows():
if not row.isnull().all():
if start_row is None:
start_row = i
else:
if start_row is not None:
tables.append((start_row, i - 1))
start_row = None
if start_row is not None:
tables.append((start_row, len(data) - 1))
return tables
# 找到所有表格区域
tables = find_tables(df)
# 处理每个表格
for (start, end) in tables:
# 假设表格之间有空行分隔
table_df = df.iloc[start:end + 1]
# 可以在这里对每个表格进行进一步处理
print(f"Table from row {start} to {end}")
print(table_df)
print("\n")
# 你可以根据实际情况调整 find_tables 函数来识别你的表格范围。

复制代码

这个方法比较简单粗暴，假设表格之间有空行分隔，可以根据具体情况进行调整。如果表格间没有明确的空行或其他分隔符，那可能需要更复杂的逻辑来识别。

这段代码的核心是 `find_tables` 函数，通过遍历 DataFrame 来识别每个表格的起止行。识别出表格之后，可以对每个表格进行进一步的数据处理，比如输出、分析等。

### 小技巧：

1. 确保你的 Excel 文件格式正确（.xlsx 或 .xls）。
2. 使用 `header=None` 来确保读入所有行数据，而不丢失表头信息。
3. 根据实际表格的结构和分隔方式调整 `find_tables` 函数。

希望这段代码能够帮到你，记住，数据处理就像解谜游戏，一步一步总会找到真相的！加油！

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

pallas · 发表于 2024-6-22 12:22:54

FishC 发表于 2024-6-22 11:58
好的，看来我们需要修正对 `pandas.ExcelWriter` 的使用方式以及它的保存方法。`pandas.ExcelWriter` ...

但这个还是报错：
Traceback (most recent call last):
  File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\pip-23.2.1.dist-info\finish.py", line 38, in main
worksheet.append(header)
^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'append'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\pip-23.2.1.dist-info\finish.py", line 66, in <module>
main(files, output_file, separators)
  File "D:\PyCharm\excel_merge\excel_merge\.venv\Lib\site-packages\pip-23.2.1.dist-info\finish.py", line 33, in main
with pd.ExcelWriter(output_file, engine='openpyxl') as writer:
  File "D:\anaconda3\Lib\site-packages\pandas\io\excel\_base.py", line 1370, in __exit__
self.close()
  File "D:\anaconda3\Lib\site-packages\pandas\io\excel\_base.py", line 1374, in close
self._save()
  File "D:\anaconda3\Lib\site-packages\pandas\io\excel\_openpyxl.py", line 110, in _save
self.book.save(self._handles.handle)
  File "D:\anaconda3\Lib\site-packages\openpyxl\workbook\workbook.py", line 407, in save
save_workbook(self, filename)
  File "D:\anaconda3\Lib\site-packages\openpyxl\writer\excel.py", line 293, in save_workbook
writer.save()
  File "D:\anaconda3\Lib\site-packages\openpyxl\writer\excel.py", line 275, in save
self.write_data()
  File "D:\anaconda3\Lib\site-packages\openpyxl\writer\excel.py", line 89, in write_data
archive.writestr(ARC_WORKBOOK, writer.write())
                                 ^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\openpyxl\workbook\_writer.py", line 148, in write
self.write_views()
  File "D:\anaconda3\Lib\site-packages\openpyxl\workbook\_writer.py", line 135, in write_views
active = get_active_sheet(self.wb)
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\openpyxl\workbook\_writer.py", line 33, in get_active_sheet
raise IndexError("At least one sheet must be visible")
IndexError: At least one sheet must be visible

账号		自动登录	找回密码
密码			立即注册

python读取excel中单个sheet表内的多个表

浏览过的版块