问题描述
假设您有以下 SQL 表:
A B C2 1 43 4 53 1 11 4 05 0 1
并且您想为列 B 的每个不同值添加/显示包含列 A 的平均值(或任何其他聚合函数)的列.您想保留所有列.所以结果应该是这样的:
A B C avg(A)|B2 1 4 2.53 4 5 2.03 1 1 2.51 4 0 2.05 0 1 5.0
据我所知,在 Pandas 中最好的方法是:
>>>df['avg(A)|B'] = df.groupby('B')['A'].transform('mean')>>>dfA B C 平均(A)|B0 2 1 4 2.51 3 4 5 2.02 3 1 1 2.53 1 4 0 2.04 5 0 1 5.0您将如何在 SQL 中执行此操作?可以避免使用 JOIN 吗?
您可以加入一个派生表,其中包含每个 b 分组的聚合值
select * from mytable t1加入 (选择平均(a),b来自 mytableb组) t2 上 t2.b = t1.b
或使用子查询
select *, (select avg(a) from mytable t2 where t2.b = t1.b)从 mytable t1
这个问题同时标记了 mysql 和 psql,所以我不确定您使用的是哪个数据库.但是在 postgres 上你可以使用窗口函数
select *, avg(a) over (partition by b)来自 mytable
Suppose you have the following SQL table:
A B C
2 1 4
3 4 5
3 1 1
1 4 0
5 0 1
And you want to add/show a column containing the mean (or any other aggregate function) of column A for each distinct value of column B. You want to keep all columns. So the result would look like this:
A B C avg(A)|B
2 1 4 2.5
3 4 5 2.0
3 1 1 2.5
1 4 0 2.0
5 0 1 5.0
The best way to do it in pandas, as far as I know, would be:
>>> df['avg(A)|B'] = df.groupby('B')['A'].transform('mean')
>>> df
A B C avg(A)|B
0 2 1 4 2.5
1 3 4 5 2.0
2 3 1 1 2.5
3 1 4 0 2.0
4 5 0 1 5.0
How would you do it in SQL? Can one avoid using a JOIN?
You can join to a derived table that contains the aggregate value for each grouping of b
select * from mytable t1
join (
select avg(a), b
from mytable
group by b
) t2 on t2.b = t1.b
or using a subquery
select *, (select avg(a) from mytable t2 where t2.b = t1.b)
from mytable t1
the question is tagged both mysql and psql, so I'm not sure which db you're using. But on postgres you can use window functions
select *, avg(a) over (partition by b)
from mytable
这篇关于什么是相当于 pandas “转换"的 SQL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!