本文介绍了使用Pandas GroupBy和size()/ count()来生成聚合的DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
所以我现在有一个名为 df
的DataFrame:
So I currently have a DataFrame called df
that goes:
date tag
2011-02-18 12:57:00-07:00 A
2011-02-19 12:57:00-07:00 A
2011-03-18 12:57:00-07:00 B
2011-04-01 12:57:00-07:00 C
2011-05-19 12:57:00-07:00 Z
2011-06-03 12:57:00-07:00 A
2011-06-05 12:57:00-07:00 A
...
我正在尝试使用GroupBy标签,日期(年/月),所以它看起来像:
I'm trying to do a GroupBy the tag, and the date (yr/month), so it looks like:
date A B C Z
2011-02 2 0 0 0
2011-03 0 1 0 0
2011-04 0 0 1 0
2011-05 0 0 0 1
2011-06 2 0 0 0
...
我尝试了以下操作,但并不能给我所需的。
I've tried the following, but it doesn't quite give me what I want.
grouped_series = df.groupby([["%s-%s" % (d.year, d.month) for d in df.date], df.tag]).size()
我知道哪个标签存在,任何帮助将不胜感激。
I know which tag exists etc. Any help will be greatly appreciated.
更新(适用于未来的人):
的字符串格式。相信我,在绘图时会更好:
Ended up keeping the datetime, instead of string format. Trust me, this will be better when plotting:
grouped_df = df.groupby([[ datetime.datetime(d.year, d.month, 1, 0, 0) for d in df.date], df.name]).size()
grouped_df = grouped_df.unstack().fillna(0)
推荐答案
你可以使用和方法:
you could use unstack()
and fillna()
methods:
>>> g = df.groupby([["%s-%s" % (d.year, d.month) for d in df.date], df.tag]).size()
>>> g
tag
2011-2 A 2
2011-3 B 1
2011-4 C 1
2011-5 Z 1
2011-6 A 2
dtype: int64
>>> g.unstack().fillna(0)
tag A B C Z
2011-2 2 0 0 0
2011-3 0 1 0 0
2011-4 0 0 1 0
2011-5 0 0 0 1
2011-6 2 0 0 0
这篇关于使用Pandas GroupBy和size()/ count()来生成聚合的DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!