本文介绍了使用多索引每日数据计算数据框中数字的连续出现次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里已经回答了这个问题的一个版本(但是,这使用了分钟频率的数据).

计算连续的次数具有多索引的数据帧中数字的出现

我有一个包含多索引(股票代码和日期)的数据框,其中包含一个包含 1 和 0 的虚拟列,我想为每只股票计算每一行中 1 或 0 出现的次数假人"列,每次从 1 开始,向上计数为 1,向下计数为 0 我在下面有一个示例,其中Counter"列代表我想要创建的内容:

df = pd.DataFrame( {'股票':['AAPL','AAPL','AAPL','AAPL','MSFT','MSFT','MSFT','MSFT'],'日期时间':['2015-01-02'、'2015-01-03'、'2015-01-04'、'2015-01-05'、'2015-01-02'、'2015-01-03', '2015-01-04', '2015-01-05'],'虚拟': [0, 0, 1, 1, 1,1, 0, 1],'计数器':[-1, -2, 1, 2, 1, 2, -1, 1]})df['datetime'] = pd.to_datetime(df['datetime'])df.set_index(['stock', 'datetime'], inplace =True)
解决方案

尝试类似:

将熊猫导入为 pddf = pd.DataFrame({'股票':['AAPL','AAPL','AAPL','AAPL','MSFT', 'MSFT', 'MSFT', 'MSFT'],'日期时间':['2015-01-02','2015-01-03','2015-01-04'、'2015-01-05'、'2015-01-02'、'2015-01-03'、'2015-01-04', '2015-01-05'],'虚拟': [0, 0, 1, 1, 1, 1, 0, 1]})df['datetime'] = pd.to_datetime(df['datetime'])df.set_index(['stock', 'datetime'], inplace=True)# 将每只股票中的连续 1 组和 0 组组合在一起df['group'] = df.groupby('stock')['Dummy'] \.transform(lambda g: g.ne(g.shift()).cumsum())# 在计数器 1 中设置值占位符 ->1, 0 ->-1df['Counter'] = df['Dummy'].apply(lambda x: 1 if x == 1 else -1)# 从每个股票和组中获取 cumsumdf['Counter'] = df.groupby(['stock', 'group'])['Counter'].cumsum().astype(int)# 删除组列df = df.drop(columns='group')# 用于显示打印(df.to_string())

输出:

虚拟计数器股票日期时间苹果 2015-01-02 0 -12015-01-03 0 -22015-01-04 1 12015-01-05 1 2微软 2015-01-02 1 12015-01-03 1 22015-01-04 0 -12015-01-05 1 1

A version of this problem was answered here (this uses data at the minute frequency however).

Counting the number of consecutive occurences of numbers in dataframe with multi index

I have a dataframe that has a multi index (stock ticker and date) with a dummy column that contains 1s and 0s and I would like to count for each stock, in each row how many times the 1s or 0s have occurred in the 'Dummy" column, starting at 1 every time, and counting up for 1s and counting down for 0s I have an example below where the column 'Counter' represents what I would like to create:

df = pd.DataFrame(  {
'stock': ['AAPL', 'AAPL', 'AAPL','AAPL', 'MSFT', 'MSFT','MSFT', 'MSFT'],
'datetime': ['2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05', '2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05'],
'Dummy': [0, 0, 1, 1, 1,1, 0, 1],
'Counter': [-1, -2, 1, 2, 1, 2, -1, 1]})
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index(['stock', 'datetime'], inplace =True)
解决方案

Try something like:

import pandas as pd

df = pd.DataFrame({
    'stock': ['AAPL', 'AAPL', 'AAPL', 'AAPL',
              'MSFT', 'MSFT', 'MSFT', 'MSFT'],
    'datetime': ['2015-01-02', '2015-01-03',
                 '2015-01-04', '2015-01-05',
                 '2015-01-02', '2015-01-03',
                 '2015-01-04', '2015-01-05'],
    'Dummy': [0, 0, 1, 1, 1, 1, 0, 1]})
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index(['stock', 'datetime'], inplace=True)

# Group Consecutive 1 and 0 groups in each stock together
df['group'] = df.groupby('stock')['Dummy'] \
    .transform(lambda g: g.ne(g.shift()).cumsum())
# Set Value Placeholder in Counter 1 -> 1, 0 -> -1
df['Counter'] = df['Dummy'].apply(lambda x: 1 if x == 1 else -1)
# Get cumsum from each stock and group
df['Counter'] = df.groupby(['stock', 'group'])['Counter'].cumsum().astype(int)
# Drop Group Column
df = df.drop(columns='group')

# For Display
print(df.to_string())

Output:

                  Dummy  Counter
stock datetime
AAPL  2015-01-02      0       -1
      2015-01-03      0       -2
      2015-01-04      1        1
      2015-01-05      1        2
MSFT  2015-01-02      1        1
      2015-01-03      1        2
      2015-01-04      0       -1
      2015-01-05      1        1

这篇关于使用多索引每日数据计算数据框中数字的连续出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 14:07