以下代码创建一个值为-1、0或1的随机数据帧:
df = pd.DataFrame(np.random.randint(-1,2,size=(100, 1)), columns=['val'])
print(df['val'].value_counts())
让我们看看它包含的内容:
-1 36
0 35
1 29
Name: val, dtype: int64
然后,我尝试使用遵循以下规则的累积条件总和创建一个名为
mysum
的新列:如果val = 1并且mysum> = 0,则mysum = mysum + 1。
如果val = 1且mysum 如果val = -1且mysum 如果val = -1且mysum> 0,则mysum = mysum-2
如果val = 0且mysum 如果val = 0且mysum> 0,则mysum = mysum-1。
如果val = 0且mysum = 0,则mysum = mysum。
因此,恐怕不是那么简单:
df['mysum'] = df['val'].cumsum()
所以我尝试了以下方法:
df['mysum'] = 0
df['mysum'] = np.where((df['val'] == 1) & (df['mysum'].cumsum() >= 0), (df['mysum'].cumsum() + 1), df['mysum'].cumsum())
df['mysum'] = np.where((df['val'] == 1) & (df['mysum'].cumsum() < 0), (df['mysum'].cumsum() + 2), df['mysum'].cumsum())
df['mysum'] = np.where((df['val'] == -1) & (df['mysum'].cumsum() <= 0), (df['mysum'].cumsum() - 1), df['mysum'].cumsum())
df['mysum'] = np.where((df['val'] == -1) & (df['mysum'].cumsum() > 0), (df['mysum'].cumsum() - 2), df['mysum'].cumsum())
df['mysum'] = np.where((df['val'] == 0) & (df['mysum'].cumsum() > 0), (df['mysum'].cumsum() - 1), df['mysum'].cumsum())
df['mysum'] = np.where((df['val'] == 0) & (df['mysum'].cumsum() < 0), (df['mysum'].cumsum() + 1), df['mysum'].cumsum())
print(df['mysum'].value_counts())
print(df)
但是
mysum
列未累积!这里是一个小提琴,您可以尝试:https://repl.it/FaXZ/8
最佳答案
也许存在一个更简化的解决方案,但是您可以遍历数据框并根据您的条件设置值。
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(-1, 2, size=(100, 1)), columns=['val'])
df['mysum'] = 0
for index, row in df.iterrows():
# get the current value of mysum = mysum one row above current index
mysum = df.get_value(index - 1, 1, takeable=True)
# mysum at beginning is 0
if index == 0:
mysum = 0
# set values at current index according to conditions
if row[0] == 0 and mysum < 0:
df.set_value(index, 1, mysum + 1, takeable=True)
if row[0] == 1 and mysum < 0:
df.set_value(index, 1, mysum + 2, takeable=True)
if row[0] == -1 and mysum <= 0:
df.set_value(index, 1, mysum - 1, takeable=True)
if row[0] == 0 and mysum > 0:
df.set_value(index, 1, mysum - 1, takeable=True)
if row[0] == -1 and mysum > 0:
df.set_value(index, 1, mysum - 2, takeable=True)
if row[0] == 1 and mysum >= 0:
df.set_value(index, 1, mysum + 1, takeable=True)
if row[0] == 0 and mysum == 0:
df.set_value(index, 1, mysum, takeable=True)
print df