对 pandas 数据框列使用条件if / else逻辑

本文介绍了对 pandas 数据框列使用条件if / else逻辑的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的数据帧称为 pw2 看起来像这样，其中有两列pw1和pw2，这是获胜的概率。我想执行一些条件逻辑，以基于 pw1 和 WINNER 的列> pw2 。

My dataframe called pw2 looks something like this, where I have two columns, pw1 and pw2, which are probability of wins. I'd like to perform some conditional logic to create another column called WINNER based off pw1 and pw2.

+-------------------------+-------------+-----------+-------------+
|          Name1          |     pw1     |   Name2   |     pw2     |
+-------------------------+-------------+-----------+-------------+
| Seaking                 | 0.517184213 | Lickitung | 0.189236181 |
| Ferrothorn              | 0.172510623 | Quagsire  | 0.260884258 |
| Thundurus Therian Forme | 0.772536272 | Hitmonlee | 0.694069408 |
| Flaaffy                 | 0.28681284  | NaN       | NaN         |
+-------------------------+-------------+-----------+-------------+

我想在函数中有条件地执行此操作，但遇到了一些麻烦。

I want to do this conditionally in a function but I'm having some trouble.

如果 pw1 > pw2 ，填充 Name1

如果 pw2 > pw1 ，并填充 Name2

如果已填充 pw1 但未填充 pw2 ，则使用 Name1填充

如果已填充 pw2 但 pw1 不是，使用 Name2

if pw1 > pw2, populate with Name1
if pw2 > pw1, populate with Name2
if pw1 is populated but pw2 isn't, populate with Name1
if pw2 is populated but pw1 isn't, populate with Name2

填充，但是我的功能无法正常工作-由于某种原因，检查值是否为null无效。

But my function isn't working - for some reason checking if a value is null isn't working.

def final_winner(df):
    # If PW1 is missing and PW2 is populated, Pokemon 1 wins
    if df['pw1'] = None and df['pw2'] != None:
        return df['Number1']
    # If it's the same thing but the other way around, Pokemon 2 wins
    elif df['pw2'] = None and df['pw1'] != None:
        return df['Number2']
    # If pw2 is greater than pw1, then Pokemon 2 wins
    elif df['pw2'] > df['pw1']:
        return df['Number2']
    else
        return df['Number1']

pw2['Winner'] = pw2.apply(final_winner, axis=1)

推荐答案

不要使用 apply ，这非常慢。使用 np.where

Do not use apply, which is very slow. Use np.where

pw2 = df.pw2.fillna(-np.inf)
df['winner'] = np.where(df.pw1 > pw2, df.Name1, df.Name2)

一旦 NaN 总是失败，就可以 fillna（）并使用 -np.inf 产生相同的逻辑。

Once NaNs always lose, can just fillna() it with -np.inf to yield same logic.

查看您的代码，我们可以指出几个问题。首先，您要比较 df ['pw1'] =无，这是无效的python语法，无法进行比较。您通常想使用 == 运算符进行比较。但是，对于无，建议使用 is ，例如如果变量为None。：（...）。但是同样，您仍然处于 pandas / numpy 环境中，在该环境中实际上有多个空值（ None ， NaN ， NaT 等）。

Looking at your code, we can point out several problems. First, you are comparing df['pw1'] = None, which is invalid python syntax for comparison. You usually want to compare things using == operator. However, for None, it is recommended to use is, such as if variable is None: (...). However again, you are in a pandas/numpy environment, where there actually several values for null values (None, NaN, NaT, etc).

所以最好使用 pd.isnull（）或 df.isnull（）检查可为空性。

So, it is preferable to check for nullability using pd.isnull() or df.isnull().

仅说明一下，这就是您的代码的样子：

Just to illustrate, this is how your code should look like:

def final_winner(df):
    if pd.isnull(df['pw1']) and not pd.isnull(df['pw2']):
        return df['Name1']
    elif pd.isnull(df['pw2']) and not pd.isnull(df['pw1']):
        return df['Name1']
    elif df['pw2'] > df['pw1']:
        return df['Name2']
    else:
        return df['Name1']

df['winner'] = df.apply(final_winner, axis=1)

但同样，绝对要使用 np.where 。

这篇关于对 pandas 数据框列使用条件if / else逻辑的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！