求助,萌新交流区,萌新训练营,鱼C论坛

一介书生423 发表于 2023-11-3 08:11:20

求助

# coding=utf-8
# import pandas as pd
# may_df = pd.read_csv('may.csv')
# june_df = pd.read_csv('june.csv')
#
# merged = pd.merge(may_df,june_df,on="states",how = 'outer',suffixes=('_ahead','behind'),indicator=True)
# # find change
# changed_rows = merged!='both']
# # find new lines
# new_rows = merged=='right_only']
# print(new_rows)
#

import pandas as pd
from tqdm import tqdm

# # 读取两张表
# df1 = pd.read_csv('may.csv')
# df2 = pd.read_csv('june.csv')
#
# # 筛选出location、project_name、building、unit、high和room相同的行
# cols = ['location', 'project_name', 'building', 'unit', 'high', 'room']
# df1_grouped = df1.groupby(cols).first().reset_index()
# df2_grouped = df2.groupby(cols).first().reset_index()
#
# # 将df1_grouped和df2_grouped合并，并比较states列的值
# df = pd.DataFrame(columns=df1.columns)
# for index, row1 in tqdm(df1_grouped.iterrows(), total=len(df1_grouped)):
# for _, row2 in df2_grouped.iterrows():
#       if row1['location'] == row2['location'] and \
#             row1['project_name'] == row2['project_name'] and \
#             row1['building'] == row2['building'] and \
#             row1['unit'] == row2['unit'] and \
#             row1['high'] == row2['high'] and \
#             row1['room'] == row2['room'] and \
#             row1['states'] != row2['states']:
#          df = df.append(row1)
#
# # 输出结果
# print(df)

import pandas as pd
from tqdm import tqdm

# 读取两张表
df1 = pd.read_excel('D:\一介书生资料库\爬虫：八爪鱼\各市县整体市场\shujuchuli\八月三亚.xlsx')
df2 = pd.read_excel('D:\一介书生资料库\爬虫：八爪鱼\各市县整体市场\shujuchuli\九月三亚.xlsx')

# 筛选出location、project_name、building、unit、high和room相同的行，并选择指定的列
cols = ['区域', '项目名称', '楼盘', '单元', '楼层', '房间', '建筑面积', '房型', '挂牌清水价', '挂牌装修价']
df1_grouped = df1.groupby(cols).first().reset_index()
df2_grouped = df2.groupby(cols).first().reset_index()

# 合并两张表，并筛选出states有变化的行
df_merged = pd.concat()
df_duplicates = df_merged
df_changed = df_duplicates.loc != df_duplicates['状态_2']]

# 输出结果
df_changed.to_excel('D:\pydata\data.xlsx', index=False)
print(df_changed)

D:\PythonProject\pythonProject\Scripts\python.exe E:\qycache\xuexi\pythonProject\房地产\process（版本2.3）.py
D:\PythonProject\pythonProject\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
D:\PythonProject\pythonProject\lib\site-packages\numpy\.libs\libopenblas.EL2C6PLE4ZYW3ECEVIV3OXXGRN2NRFM2.gfortran-win_amd64.dll
D:\PythonProject\pythonProject\lib\site-packages\numpy\.libs\libopenblas64__v0.3.21-gcc_10_3_0.dll
warnings.warn("loaded more than 1 DLL from .libs:"
Traceback (most recent call last):
File "D:\PythonProject\pythonProject\lib\site-packages\pandas\core\indexes\base.py", line 3652, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '状态_1'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "E:\qycache\xuexi\pythonProject\房地产\process（版本2.3）.py", line 57, in <module>
df_changed = df_duplicates.loc != df_duplicates['状态_2']]
File "D:\PythonProject\pythonProject\lib\site-packages\pandas\core\frame.py", line 3761, in __getitem__
indexer = self.columns.get_loc(key)
File "D:\PythonProject\pythonProject\lib\site-packages\pandas\core\indexes\base.py", line 3654, in get_loc
raise KeyError(key) from err
KeyError: '状态_1'

进程已结束,退出代码1

isdkz 发表于 2023-11-3 08:11:46

你好！根据你提供的代码和错误信息，问题出现在以下这行代码：

df_changed = df_duplicates.loc != df_duplicates['状态_2']]

错误信息显示 KeyError: '状态_1' ，这意味着在 df_duplicates 中找不到名为状态_1 的列。

根据你的代码，我发现在读取Excel文件时，你使用了 pd.read_excel 方法，并指定了文件路径。但是在你的代码中，并没有看到状态_1 这一列的定义。

解决这个问题的方法是，你需要检查一下Excel文件中是否存在名为状态_1 的列。如果存在，你需要确认该列的名称是否正确，并在代码中进行相应的更改。

另外，你还需要检查一下Excel文件中的数据是否正确，确保数据的一致性和准确性。

希望以上解答对你有帮助！如果你还有其他问题，请随时提问。

Mike_python小 发表于 2023-11-3 08:12:52

处理Excel文件中相同行的数据是否有变化

首先，我们需要导入 pandas 和 tqdm 库，因为 pandas 可以用来处理表格数据，而 tqdm 可以显示进度条，方便我们了解程序运行的进度。

接着，我们需要读取两张表格，并筛选出指定列相同的行，这里的指定列是 '区域', '项目名称', '楼盘', '单元', '楼层', '房间'。

import pandas as pd
from tqdm import tqdm

# 读取两张表格
df1 = pd.read_excel('D:\一介书生资料库\爬虫：八爪鱼\各市县整体市场\shujuchuli\八月三亚.xlsx')
df2 = pd.read_excel('D:\一介书生资料库\爬虫：八爪鱼\各市县整体市场\shujuchuli\九月三亚.xlsx')

# 筛选出指定列相同的行，并选择指定的列
cols = ['区域', '项目名称', '楼盘', '单元', '楼层', '房间']
df1_grouped = df1.groupby(cols).first().reset_index()
df2_grouped = df2.groupby(cols).first().reset_index()

然后，我们将两张表格合并，并筛选出 states 列有变化的行。

# 合并两张表格
df_merged = pd.concat()

# 筛选出 states 列有变化的行
df_duplicates = df_merged
df_changed = df_duplicates.loc != df_duplicates['状态_2']]

最后，我们可以将结果输出到 Excel 文件中。

# 输出结果到 Excel 文件中
df_changed.to_excel('D:\pydata\data.xlsx', index=False)

# 打印出变化的结果
print(df_changed)

需要注意的是，在使用 loc 方法筛选时，列名需要与表格中的列名完全一致，否则会出现 Key Error 的异常。

球一个最佳答案谢谢啦！这对我非常重要！{:10_254:}{:10_254:}{:10_254:}

一介书生423 发表于 2023-11-3 08:17:22

isdkz 发表于 2023-11-3 08:11
你好！根据你提供的代码和错误信息，问题出现在以下这行代码：

Empty DataFrame
Columns: [区域, 项目名称, 楼盘, 单元, 楼层, 房间, 建筑面积, 房型, 挂牌清水价, 挂牌装修价, 房屋用途, 产品类型, 状态, 商品房预售许可证, 发证日期]
Index: []

修正后提示

页: [1]

鱼C论坛's Archiver

求助