问题描述
假设我具有以下DataFrame:
Suppose I have the following DataFrame:
df = pd.DataFrame({'Event': ['A', 'B', 'A', 'A', 'B', 'C', 'B', 'B', 'A', 'C'],
'Date': ['2019-01-01', '2019-02-01', '2019-03-01', '2019-03-01', '2019-02-15',
'2019-03-15', '2019-04-05', '2019-04-05', '2019-04-15', '2019-06-10'],
'Sale': [100, 200, 150, 200, 150, 100, 300, 250, 500, 400]})
df['Date'] = pd.to_datetime(df['Date'])
df
Event Date Sale
A 2019-01-01 100
B 2019-02-01 200
A 2019-03-01 150
A 2019-03-01 200
B 2019-02-15 150
C 2019-03-15 100
B 2019-04-05 300
B 2019-04-05 250
A 2019-04-15 500
C 2019-06-10 400
我想获得以下结果:
Event Date Sale Total_Previous_Sale
A 2019-01-01 100 0
B 2019-02-01 200 0
A 2019-03-01 150 100
A 2019-03-01 200 100
B 2019-02-15 150 200
C 2019-03-15 100 0
B 2019-04-05 300 350
B 2019-04-05 250 350
A 2019-04-15 500 450
C 2019-06-10 400 100
其中,df['Total_Previous_Sale']
是事件(df['Event']
)在其相邻日期(df['Date']
)之前发生的总销售额(df['Sale']
).例如,
where df['Total_Previous_Sale']
is the total amount of sale (df['Sale']
) when the event (df['Event']
) takes place before its adjacent date (df['Date']
). For instance,
- 事件A的总销售发生在2019年1月1日为
- 事件A的总销售金额发生在2019-03-01之前,为100,并且
- 事件A的总销售金额发生在2019-04-15之前,为100 + 150 + 200 = 450.
基本上,它与条件累积总和几乎相同,但仅适用于所有以前的值(当前值除外).我可以使用此行获得所需的结果:
Basically, it is almost the same like conditional cumulative sum but only for all previous values (excluding current value[s]). I am able to obtain the desired result using this line:
df['Sale_Total'] = [df.loc[(df['Event'] == df.loc[i, 'Event']) & (df['Date'] < df.loc[i, 'Date']),
'Sale'].sum() for i in range(len(df))]
虽然速度很慢,但是效果很好.我相信有一个更好,更快的方法可以做到这一点.我已经尝试过这些行:
Although, it is slow but it works fine. I believe there is a better and faster way to do that. I have tried these lines:
df['Total_Previuos_Sale'] = df[df['Date'] < df['Date']].groupby(['Event'])['Sale'].cumsum()
或
df['Total_Previuos_Sale'] = df.groupby(['Event'])['Sale'].shift(1).cumsum().fillna(0)
但它会产生NaN或产生不良结果.
but it produces NaNs or comes up with an unwanted result.
推荐答案
最后,我可以找到一种更好,更快的方法来获得所需的结果.事实证明,这很容易.一个可以尝试:
Finally, I can find a better and faster way to get the desired result. It turns out that it is very easy. One can try:
df['Total_Previous_Sale'] = df.groupby('Event')['Sale'].cumsum() \
- df.groupby(['Event', 'Date'])['Sale'].cumsum()
这篇关于仅适用于所有先前值的Pandas中的条件运行总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!