问题描述
我有一个包含多索引(股票和日期时间)的数据框,其中包含一个包含 1 和 0 的虚拟列,我想计算每只股票和每一天,在每一行中 1 或 0 的次数发生在假人"中列,每次从 1 开始,向上计数为 1,向下计数为 0 我在下面有一个示例,其中Counter"列代表我想要创建的内容:
I have a dataframe that has a multi index (stock and datetime) with a dummy column that contains 1s and 0s and I would like to count for each stock and for each day, in each row how many times the 1s or 0s have occurred in the 'Dummy" column, starting at 1 every time, and counting up for 1s and counting down for 0s I have an example below where the column 'Counter' represents what I would like to create:
import pandas as pd
df = pd.DataFrame( {
'stock': ['AAPL', 'AAPL', 'AAPL','AAPL', 'AAPL','AAPL', 'AAPL', 'MSFT', 'MSFT'],
'datetime': ['2015-01-02 20:57', '2015-01-02 20:58', '2015-01-02 20:59', '2015-01-02 21:00','2015-01-03 20:57', '2015-01-03 20:58', '2015-01-03 20:59','2015-01-02 20:57', '2015-01-02 20:58'],
'Dummy': [0, 0, 1, 1, 1,1, 0, 1, 1],
'Counter': [-1, -2, 1, 2, 1, 2, 1, 1,2]})
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index(['stock', 'datetime'], inplace =True)
这里回答了这个问题的一个更简单的版本(但是忽略了股票代码和日期)
A simpler version of this problem was answered here (this ignores the tickers and dates however)
推荐答案
只需稍微修改你之前的解决方案
Just slightly modify your previous solution
m = df.Dummy.diff().ne(0).cumsum()
counters = df.groupby([df.index.get_level_values(0),
df.index.get_level_values(1).date,
m]).cumcount()+1
df['Counter'] = np.where(df['Dummy']==0, -1, 1) * counters
Out[95]:
Dummy Counter
stock datetime
AAPL 2015-01-02 20:57:00 0 -1
2015-01-02 20:58:00 0 -2
2015-01-02 20:59:00 1 1
2015-01-02 21:00:00 1 2
2015-01-03 20:57:00 1 1
2015-01-03 20:58:00 1 2
2015-01-03 20:59:00 0 -1
MSFT 2015-01-02 20:57:00 1 1
2015-01-02 20:58:00 1 2
这篇关于使用多索引计算数据帧中数字的连续出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!