日期时间列的简化

日期时间列的简化

本文介绍了日期时间列的简化 Pandas groupby 聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有日期时间、整数和字符串列的 Pandas 数据框

I have this pandas dataframe with a datetime, an integer, and a string column

from io import StringIO
import pandas as pd
data1 =  """Year        N   X
            2008-01-01  2   A
            2008-01-01  3   B
            2008-01-01  6   C
            2008-01-01  2   D
            2010-01-01  7   A
            2010-01-01  1   B
            2010-01-01  8   C
            2012-01-01  9   A
            2012-01-01  4   B
            2016-01-01  1   A"""

df = pd.read_csv(StringIO(data1), delim_whitespace=True, parse_dates=["Year"])

我可以简单地将列 N 聚合为计数、最小值和最大值:

I can aggregate column N for count, min, and max simply as:

df1 = df.groupby("X")["N"].agg(Count="count", Min="min", Max="max").reset_index()
print(df1)

   X  Count  Min  Max
0  A      4    1    9
1  B      3    1    4
2  C      2    6    8
3  D      1    2    2

有没有办法让 Year 列只显示年份?我可以通过几个步骤来做到这一点:

Is there a way to achieve the same for column Year only displaying the year?I can do this in several steps:

g = df.groupby("X")["Year"]
df2 = g.agg(Count= "count").reset_index()
df2["Start_date"] = g.min().dt.year.values
df2["End_date"] = g.max().dt.year.values
print(df2)

   X  Count  Start_date  End_date
0  A      4        2008      2016
1  B      3        2008      2012
2  C      2        2008      2010
3  D      1        2008      2008

但是一个类似于上面N的版本,例如

But a version similar to that above for Nsuch as

df2 = df.groupby("X")["Year"].agg(Count="count", Min="min().dt.year.values", Max="max().dt.year.values").reset_index()

显然不起作用.有没有更简单的方法来聚合pandas groupby 中的第一年和最后一年(除了上面明显的方法首先提取最小/最大日期,然后将日期时间列转换为年份列)?

obviously does not work. Is there a simpler way to aggregate the first and last year in pandas groupby (apart from the obvious approach to first extract min/max dates as above, then convert the datetime columns into year columns)?

推荐答案

您是否尝试过将 GroupBy.agg 与命名聚合一起使用?

Have you tried using GroupBy.agg with named aggregation?

df.assign(Year=pd.to_datetime(df['Year']).dt.year).groupby('X').agg(
    N=('N', 'count'), Start_date=('Year', 'first'), End_date=('Year', 'last'),)

   N  Start_date  End_date
X
A  4        2008      2016
B  3        2008      2012
C  2        2008      2010
D  1        2008      2008

如果日期不是升序,使用 'min''max' 而不是 'first''最后',分别.

If the dates aren't ascending, use 'min' and 'max' instead of 'first' and 'last', respectively.

这种方式可以让你避免在 grouper 中使用 lambda 表达式(所以这是非常高效的).有关命名聚合的更多信息,请参阅我的帖子此处.

This way allows you to avoid using lambda expressions in the grouper (so this is quite performant). More on named aggregation can be found in my post here.

这篇关于日期时间列的简化 Pandas groupby 聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 11:21