问题描述
我正在尝试根据多列上的某些条件替换Pandas数据框中的值.对于单列标准,可以使用字典非常优雅地完成此操作(例如用字典重新映射熊猫列中的值):
I'm trying to replace values in a Pandas data frame, based on certain criteria on multiple columns. For a single column criteria this can be done very elegantly with a dictionary (e.g. Remap values in pandas column with a dict):
import pandas as pd
df = pd.DataFrame({'col1': {0:1, 1:1, 2:2}, 'col2': {0:10, 1:20, 2:20}})
rdict = {1:'a', 2:'b'}
df2 = df.replace({"col1": rdict})
输入df
:
col1 col2
0 1 10
1 1 20
2 2 20
结果df2
:
col1 col2
0 a 10
1 a 20
2 b 20
我正在尝试将其扩展到多个列中的条件(例如where col1==1, col2==10
-> replace).对于单个条件,可以这样完成:
I'm trying to extend this to criteria over multiple columns (e.g. where col1==1, col2==10
-> replace). For a single criteria this can be done like:
df3=df.copy()
df3.loc[((df['col1']==1)&(df['col2']==10)), 'col1'] = 'c'
这将导致df3
:
col1 col2
0 c 10
1 1 20
2 2 20
我的现实生活中的问题涉及很多标准,其中涉及大量的df3.loc[((criteria1)&(criteria2)), column] = value
调用,与使用字典作为查找表"进行替换相比,这要优雅得多.是否可以将优雅的解决方案(df2 = df.replace({"col1": rdict})
)扩展到一种设置,其中一列中的值被基于多列的条件替换?
My real life problem has a large number of criteria, which would involve a large number of df3.loc[((criteria1)&(criteria2)), column] = value
calls, which is far less elegant the the replacement using a dictionary as a "lookup table". Is it possible to extend the elegant solution (df2 = df.replace({"col1": rdict})
) to a setup where values in one column are replaced by criteria based on multiple columns?
我要达到的目标的一个示例(尽管在我的现实生活中,标准的数量要大很多):
An example of what I'm trying to achieve (although in my real life case the number of criteria is a lot larger):
df = pd.DataFrame({'col1': {0:1, 1:1, 2:2, 3:2}, 'col2': {0:10, 1:20, 2:10, 3:20}})
df3=df.copy()
df3.loc[((df['col1']==1)&(df['col2']==10)), 'col1'] = 'a'
df3.loc[((df['col1']==1)&(df['col2']==20)), 'col1'] = 'b'
df3.loc[((df['col1']==2)&(df['col2']==10)), 'col1'] = 'c'
df3.loc[((df['col1']==2)&(df['col2']==20)), 'col1'] = 'd'
输入df
:
0 1 10
1 1 20
2 2 10
3 2 20
结果df3
:
col1 col2
0 a 10
1 b 20
2 c 10
3 d 20
推荐答案
我们可以使用merge
.
假设您的df看起来像
df = pd.DataFrame({'col1': {0:1, 1:1, 2:2, 3:2, 4:2, 5:1}, 'col2': {0:10, 1:20, 2:10, 3:20, 4: 20, 5:10}})
col1 col2
0 1 10
1 1 20
2 2 10
3 2 20
4 2 20
5 1 10
您的条件替换可以表示为另一个数据框:
And your conditional replacement can be represented as another dataframe:
df_replace
col1 col2 val
0 1 10 a
1 1 20 b
2 2 10 c
3 2 20 d
(As OP (Bart) pointed out, you can save this in a csv file.)
然后您就可以使用
df = df.merge(df_replace, on=["col1", "col2"], how="left")
col1 col2 val
0 1 10 a
1 1 20 b
2 2 10 c
3 2 20 d
4 2 20 d
5 1 10 a
然后您只需要放下col1
.
正如MaxU指出的那样,可能有一些行没有被替换而导致出现NaN
.我们可以使用
As MaxU pointed out, there could be rows that does not get replaced and resulting in NaN
. We can use a line like
df["val"] = df["val"].combine_first(df["col1"])
如果合并后的结果值为NaN
,则
填充col1
中的值.
to fill in values from col1
if the resulting values after merge is NaN
.
这篇关于 pandas 替换,多列标准的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!