问题描述
我有一个带有日期时间、整数和字符串列的 Pandas 数据框
I have this pandas dataframe with a datetime, an integer, and a string column
from io import StringIO
import pandas as pd
data1 = """Year N X
2008-01-01 2 A
2008-01-01 3 B
2008-01-01 6 C
2008-01-01 2 D
2010-01-01 7 A
2010-01-01 1 B
2010-01-01 8 C
2012-01-01 9 A
2012-01-01 4 B
2016-01-01 1 A"""
df = pd.read_csv(StringIO(data1), delim_whitespace=True, parse_dates=["Year"])
我可以简单地将列 N
聚合为计数、最小值和最大值:
I can aggregate column N
for count, min, and max simply as:
df1 = df.groupby("X")["N"].agg(Count="count", Min="min", Max="max").reset_index()
print(df1)
X Count Min Max
0 A 4 1 9
1 B 3 1 4
2 C 2 6 8
3 D 1 2 2
有没有办法让 Year
列只显示年份?我可以通过几个步骤来做到这一点:
Is there a way to achieve the same for column Year
only displaying the year?I can do this in several steps:
g = df.groupby("X")["Year"]
df2 = g.agg(Count= "count").reset_index()
df2["Start_date"] = g.min().dt.year.values
df2["End_date"] = g.max().dt.year.values
print(df2)
X Count Start_date End_date
0 A 4 2008 2016
1 B 3 2008 2012
2 C 2 2008 2010
3 D 1 2008 2008
但是一个类似于上面N
的版本,例如
But a version similar to that above for N
such as
df2 = df.groupby("X")["Year"].agg(Count="count", Min="min().dt.year.values", Max="max().dt.year.values").reset_index()
显然不起作用.有没有更简单的方法来聚合pandas groupby 中的第一年和最后一年(除了上面明显的方法首先提取最小/最大日期,然后将日期时间列转换为年份列)?
obviously does not work. Is there a simpler way to aggregate the first and last year in pandas groupby (apart from the obvious approach to first extract min/max dates as above, then convert the datetime columns into year columns)?
推荐答案
您是否尝试过将 GroupBy.agg
与命名聚合一起使用?
Have you tried using GroupBy.agg
with named aggregation?
df.assign(Year=pd.to_datetime(df['Year']).dt.year).groupby('X').agg(
N=('N', 'count'), Start_date=('Year', 'first'), End_date=('Year', 'last'),)
N Start_date End_date
X
A 4 2008 2016
B 3 2008 2012
C 2 2008 2010
D 1 2008 2008
如果日期不是升序,使用 'min'
和 'max'
而不是 'first'
和 '最后'
,分别.
If the dates aren't ascending, use 'min'
and 'max'
instead of 'first'
and 'last'
, respectively.
这种方式可以让你避免在 grouper 中使用 lambda 表达式(所以这是非常高效的).有关命名聚合的更多信息,请参阅我的帖子此处.
This way allows you to avoid using lambda expressions in the grouper (so this is quite performant). More on named aggregation can be found in my post here.
这篇关于日期时间列的简化 Pandas groupby 聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!