本文介绍了什么是相当于 pandas “转换"的 SQL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有以下 SQL 表:

 A B C2 1 43 4 53 1 11 4 05 0 1

并且您想为列 B 的每个不同值添加/显示包含列 A 的平均值(或任何其他聚合函数)的列.您想保留所有列.所以结果应该是这样的:

 A B C avg(A)|B2 1 4 2.53 4 5 2.03 1 1 2.51 4 0 2.05 0 1 5.0

据我所知,在 Pandas 中最好的方法是:

>>>df['avg(A)|B'] = df.groupby('B')['A'].transform('mean')>>>dfA B C 平均(A)|B0 2 1 4 2.51 3 4 5 2.02 3 1 1 2.53 1 4 0 2.04 5 0 1 5.0

您将如何在 SQL 中执行此操作?可以避免使用 JOIN 吗?

解决方案

您可以加入一个派生表,其中包含每个 b 分组的聚合值

select * from mytable t1加入 (选择平均(a),b来自 mytableb组) t2 上 t2.b = t1.b

或使用子查询

select *, (select avg(a) from mytable t2 where t2.b = t1.b)从 mytable t1

这个问题同时标记了 mysql 和 psql,所以我不确定您使用的是哪个数据库.但是在 postgres 上你可以使用窗口函数

select *, avg(a) over (partition by b)来自 mytable

Suppose you have the following SQL table:

  A  B  C
  2  1  4
  3  4  5
  3  1  1
  1  4  0
  5  0  1

And you want to add/show a column containing the mean (or any other aggregate function) of column A for each distinct value of column B. You want to keep all columns. So the result would look like this:

  A  B  C    avg(A)|B
  2  1  4       2.5
  3  4  5       2.0
  3  1  1       2.5
  1  4  0       2.0
  5  0  1       5.0

The best way to do it in pandas, as far as I know, would be:

>>> df['avg(A)|B'] = df.groupby('B')['A'].transform('mean')
>>> df
   A  B  C  avg(A)|B
0  2  1  4       2.5
1  3  4  5       2.0
2  3  1  1       2.5
3  1  4  0       2.0
4  5  0  1       5.0

How would you do it in SQL? Can one avoid using a JOIN?

解决方案

You can join to a derived table that contains the aggregate value for each grouping of b

select * from mytable t1
join (
    select avg(a), b
    from mytable
    group by b
) t2 on t2.b = t1.b

or using a subquery

select *, (select avg(a) from mytable t2 where t2.b = t1.b)
from mytable t1

the question is tagged both mysql and psql, so I'm not sure which db you're using. But on postgres you can use window functions

select *, avg(a) over (partition by b)
from mytable

这篇关于什么是相当于 pandas “转换"的 SQL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 05:01