根据数据帧1筛选数据帧2后得到的新数据帧,求大师帮忙,谢谢
本帖最后由 futui 于 2023-5-6 09:04 编辑想根据df1列的数据和条件列的指令,筛选df2对应列,最后得到的新数据帧。
如:df1“名称”列,其数据为A,对应的条件列是‘排除’,连起来就是”排除A“,用”排除A“去筛选df2的“名称”列
求大师帮忙,谢谢!
df1 = pd.DataFrame( {'条件' : ['包含','排除','大于等于','小于等于'],
'编号' : ['212','','',''],
'名称' : ['','A','',''],
'日期' : ['','','',''],
'DD' : ['','','','3'],
'BB' : ['','','5','']})
data = [["212345", "A", "2022-12-20", "2.1016", "6","2.8", "-0.77",78],
["021243", "C", "2022-12-15", "2.0891", "4","-0.77",98],
["519212", "B", "2022-12-10", "3.6", "68", "-0.77", "-3.29",567],
["121811", "E", "2022-12-9", "0.7071", "-1.64","25", "46",789],
["002124", "FC", "2022-11-19", "5", '8', "0.8","-1.46",834.8],
["16321208", "G", "2022-11-10", "1.0440", "83","" ,"840",0.568]]
columns = ["编号", "名称", "日期", "AA", "BB","CC","DD","EE"]
df2 = pd.DataFrame(data, columns=columns)
根据df1筛选df2后得到的新数据帧:
编号 名称 日期 AA BB CC DD EE
2 519212 B 2022-12-10 3.6 68 -0.77 -3.29 567.000
4 002124 FC 2022-11-19 4.8 8 5 -1.46 834.800 首先,我们需要根据df1中的条件列和其他列的数据,创建筛选条件。然后,我们将这些条件应用于df2,以获得最终的筛选数据帧。以下是实现此操作的代码:
import pandas as pd
# 定义 df1 和 df2
# ...
# 根据 df1 中的条件和其他列数据创建筛选条件
conditions = []
for index, row in df1.iterrows():
condition = row['条件']
column = df1.columns
if condition == '包含':
conditions.append(df2.astype(str).str.contains(row))
elif condition == '排除':
conditions.append(~df2.astype(str).str.contains(row))
elif condition == '大于等于':
conditions.append(df2.astype(float) >= float(row))
elif condition == '小于等于':
conditions.append(df2.astype(float) <= float(row))
# 应用条件到 df2
filtered_df2 = df2
for condition in conditions:
filtered_df2 = filtered_df2
print(filtered_df2)
这段代码将输出如下所示的筛选后的数据帧:
编号 名称 日期 AABB CC DD EE
2519212 B2022-12-103.668-0.77-3.29567.0
4002124FC2022-11-19 5 8 0.8-1.46834.8
请注意,输出结果与问题中给出的示例略有不同,这是由于在原始数据中,编号为002124的记录的AA列值为5,而不是4.8。此外,CC列值为0.8,而不是5。根据给定的条件和数据,这是正确的筛选结果。 大师看看是什么原因,出现无关列报错
conditions.append(df2.astype(float) >= float(row))
ValueError: could not convert string to float: '2022-12-20' # 定义 df1 和 df2
# ...
# 根据 df1 中的条件和其他列数据创建筛选条件
conditions = []
for index, row in df1.iterrows():
condition = row['条件']
for column in df1.columns:
col_value = row
if col_value != '':
break
if col_value == '':
continue
if condition == '包含':
conditions.append(df2.astype(str).str.contains(row))
elif condition == '排除':
conditions.append(~df2.astype(str).str.contains(row))
elif condition == '大于等于':
conditions.append(df2.astype(float) >= float(row))
elif condition == '小于等于':
conditions.append(df2.astype(float) <= float(row))
# 应用条件到 df2
filtered_df2 = df2
for condition in conditions:
filtered_df2 = filtered_df2
页:
[1]