我有一个很长很宽的数据框。我想在该数据框中创建一个新列,该值取决于df中的许多其他列。此新列中的值所需的计算(也要更改)取决于其他某个列中的值。

this questionthis question的答案很接近,但对我来说还不太有效。

最终我将可以应用大约30种不同的计算,因此我不太热衷np.where函数,对于太多条件而言,该函数可读性不强。

还强烈建议我不要对数据帧中的所有行进行for循环,因为这可能会降低性能(如果我错了,请更正我)。

我尝试做的是:

import pandas as pd
import numpy as np

# Information in my columns look something like this:
df['text'] = ['dab', 'def', 'bla', 'zdag', 'etc']
df['values1'] = [3 , 4, 2, 5, 2]
df['values2'] = [6, 3, 21, 44, 22]
df['values3'] = [103, 444, 33, 425, 200]

# lists to check against to decide upon which calculation is required
someList = ['dab', 'bla']
someOtherList = ['def', 'zdag']
someThirdList = ['etc']

conditions = [
    (df['text'] is None),
    (df['text'] in someList),
    (df['text'] in someOtherList),
    (df['text'] in someThirdList)]
choices = [0,
           round(df['values2'] * 0.5 * df['values3'], 2),
           df['values1'] + df['values2'] - df['values3'],
           df['values1'] + 249]
df['mynewvalue'] = np.select(conditions, choices, default=0)
print(df)


我希望基于df['text']中的行值,正确的计算将应用于df['mynewvalue']的相同行值。

相反,我收到错误The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我该如何编程呢,以便可以使用这种条件为df ['mynewvalue']列定义正确的计算?

最佳答案

错误来自以下条件:

conditions = [
    ... ,
    (df['text'] in someList),
    (df['text'] in someOtherList),
    (df['text'] in someThirdList)]


您尝试询问列表中是否有几个元素。答案是一个列表(针对每个元素)。正如错误所暗示的那样,您必须决定是否在至少一个元素验证属性(any)或所有元素都验证属性(any)时验证条件。

一种解决方案是对isin数据帧使用all (doc)pandas (doc)

在这里使用any

import pandas as pd
import numpy as np

# Information in my columns look something like this:
df = pd.DataFrame()

df['text'] = ['dab', 'def', 'bla', 'zdag', 'etc']
df['values1'] = [3, 4, 2, 5, 2]
df['values2'] = [6, 3, 21, 44, 22]
df['values3'] = [103, 444, 33, 425, 200]

# other lists to test against whether
someList = ['dab', 'bla']
someOtherList = ['def', 'zdag']
someThirdList = ['etc']

conditions = [
    (df['text'] is None),
    (df['text'].isin(someList)),
    (df['text'].isin(someOtherList)),
    (df['text'].isin(someThirdList))]
choices = [0,
           round(df['values2'] * 0.5 * df['values3'], 2),
           df['values1'] + df['values2'] - df['values3'],
           df['values1'] + 249]
df['mynewvalue'] = np.select(conditions, choices, default=0)
print(df)
#    text  values1  values2  values3  mynewvalue
# 0   dab        3        6      103       309.0
# 1   def        4        3      444      -437.0
# 2   bla        2       21       33       346.5
# 3  zdag        5       44      425      -376.0
# 4   etc        2       22      200       251.0

07-26 05:07