读取txt文件转为excel文件报错

颂风侯 · 发表于 2024-11-4 10:33:03

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

写了一个代码把txt转为excel，失败了，代码如下

"""
代码如下：
```python
"""
import pandas as pd
import tkinter as tk
from tkinter import filedialog
from tkinter import ttk
import time
from openpyxl import load_workbook
input_file_path = ""
tem_file_path = ""
tar_folder_path = ""
df_merge = []
def select_input_file():
global input_file_path
input_file_path = filedialog.askopenfilename(filetypes=[("text files", "*.txt")], title="请选择txt，.txt格式")
if input_file_path:
input_file.set(input_file_path)
input_file_label.config(text=input_file_path.split('/')[-1])
def select_target_folder():
global tar_folder_path
tar_folder_path = filedialog.askdirectory(title="请选择输出文件夹")
if tar_folder_path:
target_folder.set(tar_folder_path)
target_folder_label.config(text=tar_folder_path.split('/')[-1])
def run():
global input_file_path
global tem_file_path
global tar_folder_path
global df_merge
status_label.config(text="正在运行中，请稍候")
root.update()
time.sleep(2) # 模拟命令执行时间
status_label.config(text="已完成")
input_file = input_file_path
template_file = tem_file_path
target_folder = tar_folder_path
# 读取txt文件内容，转换为DataFrame
with open(input_file, 'r', encoding='utf-8') as f:
lines = f.readlines()
data = [line.strip().split() for line in lines]
columns = ['管线号', '焊缝编号', '焊接类型', '主要信息', '管线寸口']
df = pd.DataFrame(data, columns=columns)
# 对列“焊缝编号”的第一个字符去除
df['焊缝编号'] = df['焊缝编号'].str[1:]
# 对列“焊接类型”的数据进行替换
df['焊接类型'] = df['焊接类型'].replace(
{'BW': '对焊', 'LET': '开口焊', 'SW': '承插焊', 'SOF': '承插焊', 'SOB': '承插焊'})
df['主管外径'] = df['主要信息'].apply(lambda x: x[1:x.find('*')])
df['支管外径'] = df['主管外径']
df['壁厚'] = df['主要信息'].apply(lambda x: x[x.find('*') + 1:x.find(',')])
df['支管壁厚'] = df['壁厚']
df['焊缝前材质'] = df['主要信息'].apply(lambda x: x[x.find(',') + 1:]) + "-"
df['焊缝后材质'] = df['焊缝前材质']
df['弯头焊点标识'] = "-"
df['管线寸口'] = ""
# 将DataFrame写入Excel文件
excel_file = f"{target_folder}/转换后excel.xlsx"
df.to_excel(excel_file, index=False)
root = tk.Tk()
root.title("江苏瑞鼎 CADWORK的txt文件转excel程序")
root.geometry("600x500+100+100")
# 设置窗口锁定缩放,可以传两个bool值，指定能否缩放，True是可以拉伸的。
root.resizable(False, False)
# 设置窗口置顶，用的到，可以用
root.attributes("-topmost", True)
input_file = tk.StringVar()
template_file = tk.StringVar()
target_folder = tk.StringVar()
# 创建按钮组件 Button，注意这里都是写函数名，不加（）
tk.Button(root, text="1、选择txt格式，格式为.txt", width=32, command=select_input_file, font=("黑体", 20)).place(x=20, y=20)
input_file_label = tk.Label(root, text="",width=40,font=("楷体", 20))
input_file_label.place(x=20, y=70)
tk.Button(root, text="2、选择输出位置，格式为文件夹", width=35, command=select_target_folder, font=("黑体", 20)).place(
x=20, y=120)
target_folder_label = tk.Label(root, text="",width=40,font=("楷体", 20))
target_folder_label.place(x=20, y=170)
# 填写一个运行的按钮
tk.Button(root, text="3、前面完成输入，按此运行", width=30, command=run, font=("黑体", 20)).place(
x=20, y=220)
status_label = tk.Label(root, text="等待中",width=30,font=("楷体", 20))
status_label.place(x=20, y=270)
version_label = tk.Label(root, text="软件版本：V1.0 20241031",width=30,font=("宋体", 20)).place(x=20, y=350)
author_label = tk.Label(root, text="制作：朱宁",width=30,font=("宋体", 20)).place(x=20, y=400)
# 开启窗口对象
root.mainloop()

复制代码

报错好像是tkinter得到类型错误
Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\tkinter\__init__.py", line 1967, in __call__
return self.func(*args)
         ^^^^^^^^^^^^^^^^
  File "D:\PYTHON_NEW\pack20240926\pythonProject1\cadworx_excel.py", line 57, in run
lines = f.readlines()
         ^^^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 15: invalid start byte

FishC · 发表于 2024-11-4 10:33:16

根据您提供的错误信息，问题出在读取txt文件时的编码问题。错误提示 `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 15: invalid start byte` 表示在使用UTF-8编码读取文件时遇到了无法解码的字节。

以下是一些可能的解决方案：

1. **尝试不同的编码**：如果txt文件不是UTF-8编码，可以尝试使用其他常见编码，如GBK或ISO-8859-1。

with open(input_file, 'r', encoding='gbk') as f:
lines = f.readlines()

复制代码

2. **自动检测编码**：使用`chardet`库来自动检测文件的编码。

首先安装`chardet`库：

pip install chardet

复制代码

然后在代码中使用：

import chardet
with open(input_file, 'rb') as f:
raw_data = f.read()
result = chardet.detect(raw_data)
encoding = result['encoding']
with open(input_file, 'r', encoding=encoding) as f:
lines = f.readlines()

复制代码

3. **忽略错误**：如果文件中只有少量无法解码的字符，可以选择忽略这些错误。

with open(input_file, 'r', encoding='utf-8', errors='ignore') as f:
lines = f.readlines()

复制代码

尝试以上方法之一，看看是否能解决问题。如果问题依然存在，请检查txt文件的实际编码格式，确保使用正确的编码进行读取。

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

颂风侯 · 发表于 2024-11-4 10:43:33

Exception in Tkinter callback
Traceback (most recent call last):
  File "D:\PYTHON_NEW\pack20240926\pythonProject1\.venv\Lib\site-packages\pandas\core\internals\construction.py", line 939, in _finalize_columns_and_data
columns = _validate_or_indexify_columns(contents, columns)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\PYTHON_NEW\pack20240926\pythonProject1\.venv\Lib\site-packages\pandas\core\internals\construction.py", line 986, in _validate_or_indexify_columns
raise AssertionError(
AssertionError: 5 columns passed, passed data had 6 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\tkinter\__init__.py", line 1967, in __call__
return self.func(*args)
         ^^^^^^^^^^^^^^^^
  File "D:\PYTHON_NEW\pack20240926\pythonProject1\cadworx_excel.py", line 69, in run
df = pd.DataFrame(data, columns=columns)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\PYTHON_NEW\pack20240926\pythonProject1\.venv\Lib\site-packages\pandas\core\frame.py", line 851, in __init__
arrays, columns, index = nested_data_to_arrays(
                           ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\PYTHON_NEW\pack20240926\pythonProject1\.venv\Lib\site-packages\pandas\core\internals\construction.py", line 520, in nested_data_to_arrays
arrays, columns = to_arrays(data, columns, dtype=dtype)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\PYTHON_NEW\pack20240926\pythonProject1\.venv\Lib\site-packages\pandas\core\internals\construction.py", line 845, in to_arrays
content, columns = _finalize_columns_and_data(arr, columns, dtype)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\PYTHON_NEW\pack20240926\pythonProject1\.venv\Lib\site-packages\pandas\core\internals\construction.py", line 942, in _finalize_columns_and_data
raise ValueError(err) from err
ValueError: 5 columns passed, passed data had 6 columns

上面是修改后的报错

"""
首先，我们需要使用Python的tkinter库来创建一个简单的图形界面，让用户选择txt文件和输出文件夹。然后，使用pandas库将txt文件内容读取到DataFrame中，并将数据写入Excel文件。
解析：
1. 导入所需库：tkinter、pandas、openpyxl
2. 创建图形界面，让用户选择txt文件和输出文件夹
3. 读取txt文件内容，将其转换为DataFrame
4. 将DataFrame写入Excel文件
代码如下：
```python
"""
import pandas as pd
import tkinter as tk
from tkinter import filedialog
from tkinter import ttk
import time
from openpyxl import load_workbook
import chardet
input_file_path = ""
tem_file_path = ""
tar_folder_path = ""
df_merge = []
def select_input_file():
global input_file_path
input_file_path = filedialog.askopenfilename(filetypes=[("text files", "*.txt")], title="请选择txt，.txt格式")
if input_file_path:
input_file.set(input_file_path)
input_file_label.config(text=input_file_path.split('/')[-1])
def select_target_folder():
global tar_folder_path
tar_folder_path = filedialog.askdirectory(title="请选择输出文件夹")
if tar_folder_path:
target_folder.set(tar_folder_path)
target_folder_label.config(text=tar_folder_path.split('/')[-1])
def run():
global input_file_path
global tem_file_path
global tar_folder_path
global df_merge
status_label.config(text="正在运行中，请稍候")
root.update()
time.sleep(2) # 模拟命令执行时间
status_label.config(text="已完成")
input_file = input_file_path
template_file = tem_file_path
target_folder = tar_folder_path
# 读取txt文件内容，转换为DataFrame
with open(input_file, 'rb') as f:
raw_data = f.read()
result = chardet.detect(raw_data)
encoding = result['encoding']
with open(input_file, 'r', encoding= encoding, errors='ignore') as f:
lines = f.readlines()
data = [line.strip().split() for line in lines]
columns = ['管线号', '焊缝编号', '焊接类型', '主要信息', '管线寸口']
df = pd.DataFrame(data, columns=columns)
# 对列“焊缝编号”的第一个字符去除
df['焊缝编号'] = df['焊缝编号'].str[1:]
# 对列“焊接类型”的数据进行替换
df['焊接类型'] = df['焊接类型'].replace(
{'BW': '对焊', 'LET': '开口焊', 'SW': '承插焊', 'SOF': '承插焊', 'SOB': '承插焊'})
df['主管外径'] = df['主要信息'].apply(lambda x: x[1:x.find('*')])
df['支管外径'] = df['主管外径']
df['壁厚'] = df['主要信息'].apply(lambda x: x[x.find('*') + 1:x.find(',')])
df['支管壁厚'] = df['壁厚']
df['焊缝前材质'] = df['主要信息'].apply(lambda x: x[x.find(',') + 1:]) + "-"
df['焊缝后材质'] = df['焊缝前材质']
df['弯头焊点标识'] = "-"
# df['管线寸口'] = ""
# 将DataFrame写入Excel文件
excel_file = f"{target_folder}/转换后excel.xlsx"
df.to_excel(excel_file, index=False)
root = tk.Tk()
root.title("江苏瑞鼎 CADWORK的txt文件转excel程序")
root.geometry("600x500+100+100")
# 设置窗口锁定缩放,可以传两个bool值，指定能否缩放，True是可以拉伸的。
root.resizable(False, False)
# 设置窗口置顶，用的到，可以用
root.attributes("-topmost", True)
input_file = tk.StringVar()
template_file = tk.StringVar()
target_folder = tk.StringVar()
# 创建按钮组件 Button，注意这里都是写函数名，不加（）
tk.Button(root, text="1、选择txt格式，格式为.txt", width=32, command=select_input_file, font=("黑体", 20)).place(x=20, y=20)
input_file_label = tk.Label(root, text="",width=40,font=("楷体", 20))
input_file_label.place(x=20, y=70)
tk.Button(root, text="2、选择输出位置，格式为文件夹", width=35, command=select_target_folder, font=("黑体", 20)).place(
x=20, y=120)
target_folder_label = tk.Label(root, text="",width=40,font=("楷体", 20))
target_folder_label.place(x=20, y=170)
# 填写一个运行的按钮
tk.Button(root, text="3、前面完成输入，按此运行", width=30, command=run, font=("黑体", 20)).place(
x=20, y=220)
status_label = tk.Label(root, text="等待中",width=30,font=("楷体", 20))
status_label.place(x=20, y=270)
version_label = tk.Label(root, text="软件版本：V1.0 20241031",width=30,font=("宋体", 20)).place(x=20, y=350)
author_label = tk.Label(root, text="制作：朱宁",width=30,font=("宋体", 20)).place(x=20, y=400)
# 开启窗口对象
root.mainloop()

复制代码

FishC · 发表于 2024-11-4 10:45:27

颂风侯发表于 2024-11-04 10:43
Exception in Tkinter callback
Traceback (most rec ...

根据您提供的错误信息，问题出在`pandas.DataFrame`的构造过程中。错误提示 `ValueError: 5 columns passed, passed data had 6 columns` 表示您传递的列数与数据的列数不匹配。

以下是一些可能的解决方案：

1. **检查数据和列的数量**：确保`data`变量中的每一行数据的列数与`columns`变量中的列数一致。

2. **打印调试信息**：在创建DataFrame之前，打印`data`和`columns`的内容，确保它们的结构和数量是正确的。

3. **自动推断列名**：如果txt文件的第一行是列名，可以使用`pandas`自动推断列名。

df = pd.read_csv(input_file, sep='\t', encoding=encoding, error_bad_lines=False)

复制代码

4. **手动调整列名**：如果列名和数据不匹配，手动调整`columns`变量以匹配数据的实际列数。

5. **检查数据格式**：确保txt文件中的数据格式正确，尤其是分隔符是否一致。

请根据以上建议检查和调整您的代码。如果问题依然存在，请提供更多关于txt文件格式的信息，以便更好地帮助您解决问题。

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

账号		自动登录	找回密码
密码			立即注册