问题描述
我已经成功地使用groupby函数按组对给定变量求和或求平均值,但是有没有一种方法可以汇总到一个值列表中,而不是得到一个单一的结果? (这还会称为聚合吗?)
I've had success using the groupby function to sum or average a given variable by groups, but is there a way to aggregate into a list of values, rather than to get a single result? (And would this still be called aggregation?)
我并不完全确定这是我应该采取的方法,因此下面是我想对玩具数据进行转换的示例.
I am not entirely sure this is the approach I should be taking anyhow, so below is an example of the transformation I'd like to make, with toy data.
也就是说,如果数据看起来像这样:
That is, if the data look something like this:
A B C
1 10 22
1 12 20
1 11 8
1 10 10
2 11 13
2 12 10
3 14 0
我想要最终得到的是类似以下内容的东西.我不完全确定是否可以通过groupby聚合到列表中来完成此操作,而对于从何处去却不知所措.
What I am trying to end up with is something like the following. I am not totally sure whether this can be done through groupby aggregating into lists, and am rather lost as to where to go from here.
假设输出:
A B C New1 New2 New3 New4 New5 New6
1 10 22 12 20 11 8 10 10
2 11 13 12 10
3 14 0
也许我应该改用枢轴?数据放入列的顺序无关紧要-在此示例中,所有列B到New6都是等效的.所有建议/纠正措施均不胜感激.
Perhaps I should be pursuing pivots instead? The order by which the data are put into columns does not matter - all columns B through New6 in this example are equivalent. All suggestions/corrections are much appreciated.
推荐答案
我的解决方案比您预期的要长一些,我敢肯定它可以缩短,但是:
my solution is a bit longer than you may expect, I'm sure it could be shortened, but:
g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"])))
k = g.reset_index()
k["i"] = k1.index
k["rn"] = k1.groupby("A")["i"].rank()
k.pivot_table(rows="A", cols="rn", values=0)
# output
# rn 1 2 3 4 5 6
# A
# 1 10 12 11 22 20 8
# 2 10 11 10 13 NaN NaN
# 3 14 10 NaN NaN NaN NaN
一些解释.第一行,g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"])))
.这一组由A
组成的df
,然后将列B
和C
放入一列:
A bit of explanation. First line, g = df.groupby("A").apply(lambda x: pd.concat((x["B"], x["C"])))
. This one group df
by A
and then put columns B
and C
into one column:
A
1 0 10
1 12
2 11
0 22
1 20
2 8
2 3 10
4 11
3 10
4 13
3 5 14
5 10
然后k = g.reset_index()
,创建顺序索引,结果是:
Then k = g.reset_index()
, creating sequential index, result is:
A level_1 0
0 1 0 10
1 1 1 12
2 1 2 11
3 1 0 22
4 1 1 20
5 1 2 8
6 2 3 10
7 2 4 11
8 2 3 10
9 2 4 13
10 3 5 14
11 3 5 10
现在,我想将此索引移到列中(我想听听如何在不重置索引的情况下创建连续列),k["i"] = k1.index
:
Now I want to move this index into column (I'd like to hear how I can make a sequential column without resetting index), k["i"] = k1.index
:
A level_1 0 i
0 1 0 10 0
1 1 1 12 1
2 1 2 11 2
3 1 0 22 3
4 1 1 20 4
5 1 2 8 5
6 2 3 10 6
7 2 4 11 7
8 2 3 10 8
9 2 4 13 9
10 3 5 14 10
11 3 5 10 11
现在,k["rn"] = k1.groupby("A")["i"].rank()
将在每个A
内添加row_number(类似于SQL中的row_number() over(partition by A order by i)
:
Now, k["rn"] = k1.groupby("A")["i"].rank()
will add row_number inside each A
(like row_number() over(partition by A order by i)
in SQL:
A level_1 0 i rn
0 1 0 10 0 1
1 1 1 12 1 2
2 1 2 11 2 3
3 1 0 22 3 4
4 1 1 20 4 5
5 1 2 8 5 6
6 2 3 10 6 1
7 2 4 11 7 2
8 2 3 10 8 3
9 2 4 13 9 4
10 3 5 14 10 1
11 3 5 10 11 2
最后,仅需旋转k.pivot_table(rows="A", cols="rn", values=0)
:
rn 1 2 3 4 5 6
A
1 10 12 11 22 20 8
2 10 11 10 13 NaN NaN
3 14 10 NaN NaN NaN NaN
这篇关于 pandas 能否将groupby汇总成一个列表,而不是总和,均值等?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!