问题描述
对于数据框
在 [2]: df = pd.DataFrame({'Name': ['foo', 'bar'] * 3,...:'排名':np.random.randint(0,3,6),...:'Val':np.random.rand(6)})...:df出[2]:姓名 等级 Val0 富 0 0.2993971 巴 0 0.9092282 富 0 0.5177003 巴 0 0.9298634 富 1 0.2093245 巴 2 0.381515
我有兴趣按名称和等级分组并可能获得汇总值
在[3]中:group = df.groupby(['Name', 'Rank'])在 [4] 中:agg = group.agg(sum)在[5]中:agg出[5]:瓦尔姓名排名酒吧 0 1.8390912 0.381515富 0 0.8170971 0.209324
但我想在原始 df
中获取一个字段,其中包含该行的组号,例如
在 [13]: df['Group_id'] = [2, 0, 2, 0, 3, 1]在 [14] 中:df出[14]:名称等级 Val Group_id0 富 0 0.299397 21 巴 0 0.909228 02 富 0 0.517700 23 巴 0 0.929863 04 富 1 0.209324 35 巴 2 0.381515 1
在 Pandas 中有什么好的方法可以做到这一点吗?
我可以用python得到它,
In [16]: from itertools import count在 [17] 中:c = count()在 [22]: group.transform(lambda x: c.next())出[22]:瓦尔0 21 02 23 04 35 1
但是在大型数据帧上它很慢,所以我认为可能有更好的内置熊猫方式来做到这一点.
DataFrameGroupBy.grouper
对象中存储了很多方便的东西.例如:
等等:
>>>df["GroupId"] = df.groupby(["Name", "Rank"]).grouper.group_info[0]>>>df名称等级 Val GroupId0 富 0 0.302482 21 巴 0 0.375193 02 富 2 0.965763 43 巴 2 0.166417 14 富 1 0.495124 35 巴 2 0.728776 1对于潜伏在某处的 grouper.group_info[0]
可能有一个更好的别名,但无论如何这应该有效.
For dataframe
In [2]: df = pd.DataFrame({'Name': ['foo', 'bar'] * 3,
...: 'Rank': np.random.randint(0,3,6),
...: 'Val': np.random.rand(6)})
...: df
Out[2]:
Name Rank Val
0 foo 0 0.299397
1 bar 0 0.909228
2 foo 0 0.517700
3 bar 0 0.929863
4 foo 1 0.209324
5 bar 2 0.381515
I'm interested in grouping by Name and Rank and possibly getting aggregate values
In [3]: group = df.groupby(['Name', 'Rank'])
In [4]: agg = group.agg(sum)
In [5]: agg
Out[5]:
Val
Name Rank
bar 0 1.839091
2 0.381515
foo 0 0.817097
1 0.209324
But I would like to get a field in the original df
that contains the group number for that row, like
In [13]: df['Group_id'] = [2, 0, 2, 0, 3, 1]
In [14]: df
Out[14]:
Name Rank Val Group_id
0 foo 0 0.299397 2
1 bar 0 0.909228 0
2 foo 0 0.517700 2
3 bar 0 0.929863 0
4 foo 1 0.209324 3
5 bar 2 0.381515 1
Is there a good way to do this in pandas?
I can get it with python,
In [16]: from itertools import count
In [17]: c = count()
In [22]: group.transform(lambda x: c.next())
Out[22]:
Val
0 2
1 0
2 2
3 0
4 3
5 1
but it's pretty slow on a large dataframe, so I figured there may be a better built in pandas way to do this.
A lot of handy things are stored in the DataFrameGroupBy.grouper
object. For example:
>>> df = pd.DataFrame({'Name': ['foo', 'bar'] * 3,
'Rank': np.random.randint(0,3,6),
'Val': np.random.rand(6)})
>>> grouped = df.groupby(["Name", "Rank"])
>>> grouped.grouper.
grouped.grouper.agg_series grouped.grouper.indices
grouped.grouper.aggregate grouped.grouper.labels
grouped.grouper.apply grouped.grouper.levels
grouped.grouper.axis grouped.grouper.names
grouped.grouper.compressed grouped.grouper.ngroups
grouped.grouper.get_group_levels grouped.grouper.nkeys
grouped.grouper.get_iterator grouped.grouper.result_index
grouped.grouper.group_info grouped.grouper.shape
grouped.grouper.group_keys grouped.grouper.size
grouped.grouper.groupings grouped.grouper.sort
grouped.grouper.groups
and so:
>>> df["GroupId"] = df.groupby(["Name", "Rank"]).grouper.group_info[0]
>>> df
Name Rank Val GroupId
0 foo 0 0.302482 2
1 bar 0 0.375193 0
2 foo 2 0.965763 4
3 bar 2 0.166417 1
4 foo 1 0.495124 3
5 bar 2 0.728776 1
There may be a nicer alias for for grouper.group_info[0]
lurking around somewhere, but this should work, anyway.
这篇关于将组 ID 取回 Pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!