问题描述
我查询的系统不允许使用 DISTINCT ,因此我的替代方法是执行 GROUP BY 以接近结果
我想要的查询的意图如下:
SELECT
pre>
SUM(column1)AS column1,
SUM(column2)AS column2,
COUNT(DISTINCT(column3))AS column3
FROM table
,我会认为我需要一些类型的嵌套查询,沿着这一行
SELECT
SUM(column1)AS column1,
SUM(column2)AS column2,
COUNT(SELECT column FROM table GROUP BY column)AS column3
FROM table
但它没有工作。
解决方案您使用的语法错误
COUNT(DISTINCT)
。DISTINCT
部分是一个关键字,而不是一个函数。根据,这应该可以工作:SELECT
SUM(column1)AS column1,
SUM(column2)AS column2,
COUNT(DISTINCT column3)AS column3
FROM table
但是,请阅读文档。 BigQuery的
COUNT(DISTINCT)
的实现有点不寻常,显然是为了更好地扩展大数据。如果您尝试计数大量不同的值,那么您可能需要指定第二个参数(和,您有一个固有的缩放问题)。
更新:
如果您有大量不同的
column3
要计数,并且您想要确切的计数,则您可以执行连接,而不是在选择列表(BigQuery似乎不允许)中放置子查询:SELECT *
FROM(
SELECT
SUM(column1)AS column1,
SUM(column2)AS column2
FROM table
)
CROSS JOIN(
SELECT count(*)AS column3
FROM(
SELECT column3
FROM table
GROUP BY column3
)
)
更新2 :
不是加入两个单行表将是非常昂贵的,但@FelipeHoffa让我更多地思考这个,我意识到我错过了一个更简单的解决方案:
SELECT
SUM(column1)AS column1,
SUM(column2)AS column2,
COUNT AS column3
FROM(
SELECT
SUM(column1)AS column1,
SUM(column2)AS column2
FROM table
GROUP BY column3
)
这个计算
column1的小计
和column2
值,按column3
分组,然后计算和总计所有小计行。感觉很正确。i'm querying a system that won't allow using DISTINCT, so my alternative is to do a GROUP BY to get near to a result
my desired query was meant to look like this,
SELECT SUM(column1) AS column1, SUM(column2) AS column2, COUNT(DISTINCT(column3)) AS column3 FROM table
for the alternative, i would think i'd need some type of nested query along the lines of this,
SELECT SUM(column1) AS column1, SUM(column2) AS column2, COUNT(SELECT column FROM table GROUP BY column) AS column3 FROM table
but it didn't work. Am i close?
解决方案You are using the wrong syntax for
COUNT(DISTINCT)
. TheDISTINCT
part is a keyword, not a function. Based on the docs, this ought to work:SELECT SUM(column1) AS column1, SUM(column2) AS column2, COUNT(DISTINCT column3) AS column3 FROM table
Do, however, read the docs. BigQuery's implementation of
COUNT(DISTINCT)
is a bit unusual, apparently so as to scale better for big data. If you are trying to count a large number of distinct values then you may need to specify a second parameter (and you have an inherent scaling problem).Update:
If you have a large number of distinct
column3
values to count, and you want an exact count, then perhaps you can perform a join instead of putting a subquery in the select list (which BigQuery seems not to permit):SELECT * FROM ( SELECT SUM(column1) AS column1, SUM(column2) AS column2 FROM table ) CROSS JOIN ( SELECT count(*) AS column3 FROM ( SELECT column3 FROM table GROUP BY column3 ) )
Update 2:
Not that joining two one-row tables would be at all expensive, but @FelipeHoffa got me thinking more about this, and I realized I had missed a simpler solution:
SELECT SUM(column1) AS column1, SUM(column2) AS column2, COUNT(*) AS column3 FROM ( SELECT SUM(column1) AS column1, SUM(column2) AS column2 FROM table GROUP BY column3 )
This one computes a subtotal of
column1
andcolumn2
values, grouping bycolumn3
, then counts and totals all the subtotal rows. It feels right.这篇关于如何做嵌套SQL select count的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!