问题描述
我的数据帧称为 pw2
看起来像这样,其中有两列pw1和pw2,这是获胜的概率。我想执行一些条件逻辑,以基于 pw1
和 WINNER 的列> pw2 。
My dataframe called pw2
looks something like this, where I have two columns, pw1 and pw2, which are probability of wins. I'd like to perform some conditional logic to create another column called WINNER
based off pw1
and pw2
.
+-------------------------+-------------+-----------+-------------+
| Name1 | pw1 | Name2 | pw2 |
+-------------------------+-------------+-----------+-------------+
| Seaking | 0.517184213 | Lickitung | 0.189236181 |
| Ferrothorn | 0.172510623 | Quagsire | 0.260884258 |
| Thundurus Therian Forme | 0.772536272 | Hitmonlee | 0.694069408 |
| Flaaffy | 0.28681284 | NaN | NaN |
+-------------------------+-------------+-----------+-------------+
我想在函数中有条件地执行此操作,但遇到了一些麻烦。
I want to do this conditionally in a function but I'm having some trouble.
- 如果
pw1
>pw2
,填充Name1
- 如果
pw2
>pw1
,并填充Name2
- 如果已填充
pw1
但未填充pw2
,则使用Name1填充
- 如果已填充
pw2
但pw1
不是,使用Name2
- if
pw1
>pw2
, populate withName1
- if
pw2
>pw1
, populate withName2
- if
pw1
is populated butpw2
isn't, populate withName1
- if
pw2
is populated butpw1
isn't, populate withName2
填充,但是我的功能无法正常工作-由于某种原因,检查值是否为null无效。
But my function isn't working - for some reason checking if a value is null isn't working.
def final_winner(df):
# If PW1 is missing and PW2 is populated, Pokemon 1 wins
if df['pw1'] = None and df['pw2'] != None:
return df['Number1']
# If it's the same thing but the other way around, Pokemon 2 wins
elif df['pw2'] = None and df['pw1'] != None:
return df['Number2']
# If pw2 is greater than pw1, then Pokemon 2 wins
elif df['pw2'] > df['pw1']:
return df['Number2']
else
return df['Number1']
pw2['Winner'] = pw2.apply(final_winner, axis=1)
推荐答案
不要使用 apply
,这非常慢。使用 np.where
Do not use apply
, which is very slow. Use np.where
pw2 = df.pw2.fillna(-np.inf)
df['winner'] = np.where(df.pw1 > pw2, df.Name1, df.Name2)
一旦 NaN
总是失败,就可以 fillna()
并使用 -np.inf
产生相同的逻辑。
Once NaN
s always lose, can just fillna()
it with -np.inf
to yield same logic.
查看您的代码,我们可以指出几个问题。首先,您要比较 df ['pw1'] =无
,这是无效的python语法,无法进行比较。您通常想使用 ==
运算符进行比较。但是,对于无
,建议使用 is
,例如如果变量为None。 :(...)
。但是同样,您仍然处于 pandas / numpy
环境中,在该环境中实际上有多个空值( None
, NaN
, NaT
等)。
Looking at your code, we can point out several problems. First, you are comparing df['pw1'] = None
, which is invalid python syntax for comparison. You usually want to compare things using ==
operator. However, for None
, it is recommended to use is
, such as if variable is None: (...)
. However again, you are in a pandas/numpy
environment, where there actually several values for null values (None
, NaN
, NaT
, etc).
所以最好使用 pd.isnull()
或 df.isnull()
检查可为空性。
So, it is preferable to check for nullability using pd.isnull()
or df.isnull()
.
仅说明一下,这就是您的代码的样子:
Just to illustrate, this is how your code should look like:
def final_winner(df):
if pd.isnull(df['pw1']) and not pd.isnull(df['pw2']):
return df['Name1']
elif pd.isnull(df['pw2']) and not pd.isnull(df['pw1']):
return df['Name1']
elif df['pw2'] > df['pw1']:
return df['Name2']
else:
return df['Name1']
df['winner'] = df.apply(final_winner, axis=1)
但同样,绝对要使用 np.where
。
这篇关于对 pandas 数据框列使用条件if / else逻辑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!