RollingGroupby对象上的字符串

RollingGroupby对象上的字符串

按照问题标题。熊猫可以依靠RollingGroupby对象上的字符串型列吗?

这是我的数据框:

# Let's say my objective is to count the number of unique cars
# over the last 1 day grouped by park

 park |    date    | to_count
------------------------------
  A   | 2019-01-01 |   Honda
  A   | 2019-01-03 |   Lexus
  A   | 2019-01-05 |   BMW
  A   | 2019-01-05 |   Lexus
  B   | 2019-01-01 |   BMW
  B   | 2019-01-08 |   Lexus
  B   | 2019-01-08 |   Lexus
  B   | 2019-01-10 |   Ford


这就是我想要的:

 park |    date    | unique_count
----------------------------------
  A   | 2019-01-01 |      1
  A   | 2019-01-03 |      1
  A   | 2019-01-05 |      2
  B   | 2019-01-01 |      1
  B   | 2019-01-08 |      1
  B   | 2019-01-10 |      1

# Bit of explanation:
# There are 2 type of cars coming to park A over last 1 day on 5th Jan so distinct count is 2.
# There are 2 cars of 1 type (Lexus) coming to park B over last 1 day on 8th Jan so distinct count is 1.


这是我尝试过的:

import pandas as pd
import numpy as np

# initiate dataframe
df = pd.DataFrame({
    'park': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
    'date': ['2019-01-01', '2019-01-03', '2019-01-05', '2019-01-05',
             '2019-01-01', '2019-01-08', '2019-01-08', '2019-01-10'],
    'to_count': ['Honda', 'Lexus', 'BMW', 'Lexus', 'BMW', 'Lexus', 'Lexus', 'Ford']
})

# string to date
df['date'] = pd.to_datetime(df['date'])

# group. This is more intuitive to me but sadly this does not work.
unique_count = df.groupby('park').rolling('1d', on='date').to_count.nunique()

# factorize then group. This works (but why???)
df['factorized'] = pd.factorize(df.to_count)[0]
unique_count = df.groupby('park').rolling('1d', on='date').factorized.apply(lambda x: len(np.unique(x)) )

result = unique_count.reset_index().drop_duplicates(subset=['park', 'date'], keep='last')


这是我的环境:


Mac 10.12 High Sierra
python3.6
熊猫0.22.0


为了强调,我需要滚动窗口功能才能工作。在此示例中,窗口恰好是1天,但我可能希望它工作3天,7天,2小时,5秒。

最佳答案

尝试这个:
-首先,按parkdate对数据帧进行分组
-通过to_count的唯一值数量进行汇总

df = pd.DataFrame({
    'park': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
    'date': ['2019-01-01', '2019-01-03', '2019-01-05', '2019-01-05',
             '2019-01-01', '2019-01-08', '2019-01-08', '2019-01-10'],
    'to_count': ['Honda', 'Lexus', 'BMW', 'Lexus', 'BMW', 'Lexus', 'Lexus', 'Ford']
})

agg_df = df.groupby(by=['park', 'date']).agg({'to_count': pd.Series.nunique}).reset_index()

关于python - Pandas 可以计算RollingGroupby对象上的字符串类型的列吗?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/54413686/

10-13 07:12