问题描述
这是
现在,我想建立一个数据透视表表具有以下属性:
$ ul
这里是在Google表格中内置的数据透视表 -
这里的概念性SQL语句是:
$ p $选择
SUM(价格),
COUNT(价格)
破折号
工作室(行),
标题(行)
地区编号(col),
类型(col)
SORTED / LIMITED BY
Studio ==> A-Z,限制3,
标题==> SUM(价格)在GRAND TOTAL DESC,限制4,
区域ID ==> COUNT(价格)在Paramount TOTAL,LIMIT 2
Type ==> AZ,NO LIMIT
我不确定如何在概念上显示小计,但我们应该能够为每个分解字段指定这些字段。
是否可以在Google BigQuery中的单个SQL语句中执行上述操作?什么是生成它的步骤?
#standardSQL
SELECT
Studio,
标题,
TerritoryID,
类型,
SUM(价格)AS价格,
COUNT(1)AS批量
FROM YourTable
GROUP BY工作室,标题,TerritoryID,类型
$ c $正如你所提到的,你的情况下这样的结果可以很容易地产生10M +行,而你想减小它的大小而不影响你的仍然能够在前端的数据透视/可视化中呈现最终数据
以下显示了如何通过在后端应用排序和限制来实现此目的(因此结果大小显着减少)w / o失去做旋转的能力,仍然显示总数等。
让我们以简化的开始进行最终查询
- 初始查询(骨架)
我们假设,根据已知标准,我们知道提前哪些工作室,标题,区域和类型应该被选中
在这种情况下,下面的查询将返回所需的数据
SELECT'Fox'
UNION ALL SELECT'Paramouont'
),
标题 #standardSQL
WITH AS AS(
SELECT'Fox'AS Studio,'Best Laid Plans'AS Title
UNION ALL SELECT'Fox','Homecoming'
UNION ALL SELECT'Paramount','Titanic'
UNION ALL SELECT'Paramount', 'Homecoming'
),
地区AS(
SELECT'US'作为TerritoryID
UNION ALL SELECT'GB'
),
总计AS(
SELECT
IFNULL(b.Studio,'Other')AS Studio,
IFNULL(b.Title,'其他')AS标题,
IFNULL(c.TerritoryID,'其他')AS TerritoryID,
类型,
ROUND(SUM(Price),2)AS价格,COUNT(1)AS交易量
FROM yourTable AS a
LEFT JOIN标题AS b ON a.Studio = b.Studio AND a.Title = b.Title
LEFT JOIN Territory AS c on a.TerritoryID = c.TerritoryID
GROUP BY Studio,Title,TerritoryID,Type
)
SELECT * FROM总计
ORDER BY Studio,Title,TerritoryID,Type
输出结果如下
Studio Title TerritoryID类型价格量
Fox最佳放映计划英国电影87.32 18
Fox Best Laid P兰斯GB电视剧集50.17 23
福克斯最佳放映计划其他电视剧集1131.0 2
福克斯最佳放映计划美国电影120.82 18
福克斯最佳放映计划美国电视剧集53.76 24
Fox Homecoming GB TV Episode 60.22 28
Fox Homecoming其他电视节目2262.0 4
Fox Homecoming美国电视剧128.45 58
其他其他GB电影142.71 29
其他其他其他GB电视剧84.8 40
其他其他其他其他电影3292.0 4
其他其他其他电视剧3282.0 16
其他其他美国电影52.92 8
其他其他美国电视剧233.05 101
派拉蒙回望GB电影18.96 4
派拉蒙回家美国电影124.84 16
派拉蒙泰坦尼克GB电影41.92 8
派拉蒙泰坦尼克号其他电影12.0 4
派拉蒙泰坦尼克号美国电影139.84 16
$ c
$ b 您可以轻松地将其反馈给您的用户界面,以任何您需要的方式将其可视化。
- 最终查询
现在,不是所有相关维度中的硬编码值 - 让我们为每个维度实现实际的标准。
所以下面的查询(vs上面的骨架查询)中唯一的变化在以下CTE中:工作室,标题和区域
#standardSQL
WITH Studios AS(
SELECT DISTINCT Studio
FROM yourTable
ORDER BY Studio LIMIT 3
),
标题AS(
SELECT Studio,T itle
FROM(
)SELECT(Studio,Title,ROW_NUMBER()OVER)(作为Studio ORDER BY PRICE DESC的分区)作为pos
FROM(SELECT Studio,Title,SUM(Price)AS Price FROM yourTable GROUP BY Studio,Title)
)WHERE pos< = 4
),
Territories AS(
SELECT TerritoryID FROM yourTable
WHERE Studio ='Paramount'GROUP BY TerritoryID
ORDER BY COUNT(1)DESC LIMIT 2
),
Totals AS(
SELECT
IFNULL(b.Studio,'Other')AS Studio,
IFNULL(b.Title,'Other')AS标题,
IFNULL(c.TerritoryID,'其他')AS TerritoryID,
类型,
ROUND(SUM(Price), 2)AS Price,COUNT(1)AS Volume
FROM yourTable AS
LEFT JOIN标题AS b ON a.Studio = b.Studio AND a.Title = b.Title
LEFT JOIN领土AS c ON a.TerritoryID = c.TerritoryID
GROUP BY Studio,Title,TerritoryID,Type
)
SELECT * FROM总计
Where'Other'IN(TerritoryID)
ORDER BY Studio,T erritoryID DESC,Type,Price DESC,Title
结果如下:
演播室标题TerritoryID类型价格音量
Fox最佳放映计划美国电影120.82 18
Fox Titanic US Movie 52.92 8
Fox 1:00 PM - 2:00 PM美国电视节目187.25 81
Fox Homecoming美国电视节目128.45 58
Fox最佳放映计划美国电视节目53.76 24
Fox最佳放映计划GB电影87.32 18
Fox Titanic GB电影78.84 16
Fox 1:00 PM - 2:00 PM GB电视剧集61.42 28
Fox Homecoming国语电视剧集60.22 28
福克斯最佳放映计划英国电视剧集50.17 23
派拉蒙泰坦尼克号美国电影139.84 16
派拉蒙归乡美国电影124.84 16
Paramount泰坦尼克号GB电影41.92 8
Paramount Homecoming GB电影18.96 4
索尼最佳放映计划美国电视剧22.9 10
索尼Homecoming美国电视剧22.9 10
Sony Best Laid计划GB Movie 63.87 13
Sony Homecoming GB电视剧集18.81 9
索尼最佳预定计划GB电视剧集4.57 3
这里的要点是 - 而BigQuery在分析数十亿行和提取所需信息方面非常高效,它非常有效客户可以使用BigQuery来实际定制结果数据,以反映该结果如何在客户端UI上的表示层中实际呈现。相反,您只需将这些数据传递给用户界面并使用您的可视化代码来处理它即可。
This is a follow-up question to Multi-level pivot in Google BigQuery, in which I wanted to know if it was possible to construct a nested pivot table in Google BigQuery using a single query. It is, and so in this follow-up question, I'd like to explore the general case.
Here is an example of the data that I'm using (which is also included in this shared Google Sheet) :
Now, I would like to build a pivot table that has the following properties:
- Nested levels at both the row and col level (the previous question only had nested-cols)
- Sub-totals within both the rows and cols (the previous only had a grand total)
- Multiple metrics (the previous only had a single metric)
- Multiple sorts -- by both deep metrics and by alphabetical (the previous did not have any sort conditions)
- Limits (the previous did not have any limits at all)
Here is the pivot built in Google Sheets --
The conceptual SQL statement here would be:
SELECT
SUM(price),
COUNT(price)
BROKEN DOWN BY
Studio (row),
Title (row)
Territory ID (col),
Type (col)
SORTED/LIMITED BY
Studio ==> A-Z, LIMIT 3,
Title ==> SUM(price) in GRAND TOTAL DESC, LIMIT 4,
Territory ID ==> COUNT(price) in Paramount TOTAL, LIMIT 2
Type ==> A-Z, NO LIMIT
I'm not sure how to conceptually show the Subtotals in, but we should be able to specify those for each of the broken-down-by fields.
Is it possible to do the above in a single SQL statement in Google BigQuery? What would be the steps to generate it?
解决方案
So usually, you would run something like below in back-end and pull result up to visualization tool (front-end) for further manipulations like sorts, limits, pivoting, etc.
#standardSQL
SELECT
Studio,
Title,
TerritoryID,
Type,
SUM(Price) AS Price,
COUNT(1) AS Volume
FROM YourTable
GROUP BY Studio, Title, TerritoryID, Type
As you mentioned, such result in your case can easily produce 10M+ rows and you want to reduce size of it w/o affecting your ability to still present final data in your pivot/visualization in front-end
Below shows how to achieve this by applying sorts and limits on back-end (so result size is drastically reduced) w/o losing ability to do pivoting and still show totals, etc.
Let’s get to final query by starting with simplified one
- Initial query (skeleton)
Let’s assume, based on known criteria, that we know in advance which Studios, Titles, Territories and Types should be selected
In this case, below query will return desired data
#standardSQL
WITH Studios AS (
SELECT 'Fox'
UNION ALL SELECT 'Paramouont'
),
Titles AS (
SELECT 'Fox' AS Studio,'Best Laid Plans' AS Title
UNION ALL SELECT 'Fox','Homecoming'
UNION ALL SELECT 'Paramount','Titanic'
UNION ALL SELECT 'Paramount','Homecoming'
),
Territories AS (
SELECT 'US' AS TerritoryID
UNION ALL SELECT 'GB'
),
Totals AS (
SELECT
IFNULL(b.Studio,'Other') AS Studio,
IFNULL(b.Title,'Other') AS Title,
IFNULL(c.TerritoryID,'Other') AS TerritoryID,
Type,
ROUND(SUM(Price), 2) AS Price, COUNT(1) AS Volume
FROM yourTable AS a
LEFT JOIN Titles AS b ON a.Studio = b.Studio AND a.Title = b.Title
LEFT JOIN Territories AS c ON a.TerritoryID = c.TerritoryID
GROUP BY Studio, Title, TerritoryID, Type
)
SELECT * FROM Totals
ORDER BY Studio, Title, TerritoryID, Type
The output will be something as below
Studio Title TerritoryID Type Price Volume
Fox Best Laid Plans GB Movie 87.32 18
Fox Best Laid Plans GB TV Episode 50.17 23
Fox Best Laid Plans Other TV Episode 1131.0 2
Fox Best Laid Plans US Movie 120.82 18
Fox Best Laid Plans US TV Episode 53.76 24
Fox Homecoming GB TV Episode 60.22 28
Fox Homecoming Other TV Episode 2262.0 4
Fox Homecoming US TV Episode 128.45 58
Other Other GB Movie 142.71 29
Other Other GB TV Episode 84.8 40
Other Other Other Movie 3292.0 4
Other Other Other TV Episode 3282.0 16
Other Other US Movie 52.92 8
Other Other US TV Episode 233.05 101
Paramount Homecoming GB Movie 18.96 4
Paramount Homecoming US Movie 124.84 16
Paramount Titanic GB Movie 41.92 8
Paramount Titanic Other Movie 12.0 4
Paramount Titanic US Movie 139.84 16
You can easily feed it back to your UI to visualize it in whatever way you need
- "Final" query
Now, instead of hard-coded values in all involved dimensions - let’s implement actual criteria(s) for each dimension.
So the only changes in below query (vs above skeleton query) are in following CTEs: Studios, Titles, and Territories
#standardSQL
WITH Studios AS (
SELECT DISTINCT Studio
FROM yourTable
ORDER BY Studio LIMIT 3
),
Titles AS (
SELECT Studio, Title
FROM (
SELECT Studio, Title, ROW_NUMBER() OVER(PARTITION BY Studio ORDER BY PRICE DESC) AS pos
FROM (SELECT Studio, Title, SUM(Price) AS Price FROM yourTable GROUP BY Studio, Title)
) WHERE pos <= 4
),
Territories AS (
SELECT TerritoryID FROM yourTable
WHERE Studio = 'Paramount' GROUP BY TerritoryID
ORDER BY COUNT(1) DESC LIMIT 2
),
Totals AS (
SELECT
IFNULL(b.Studio,'Other') AS Studio,
IFNULL(b.Title,'Other') AS Title,
IFNULL(c.TerritoryID,'Other') AS TerritoryID,
Type,
ROUND(SUM(Price), 2) AS Price, COUNT(1) AS Volume
FROM yourTable AS a
LEFT JOIN Titles AS b ON a.Studio = b.Studio AND a.Title = b.Title
LEFT JOIN Territories AS c ON a.TerritoryID = c.TerritoryID
GROUP BY Studio, Title, TerritoryID, Type
)
SELECT * FROM Totals
WHERE NOT 'Other' IN (TerritoryID)
ORDER BY Studio, TerritoryID DESC, Type, Price DESC, Title
The result here is:
Studio Title TerritoryID Type Price Volume
Fox Best Laid Plans US Movie 120.82 18
Fox Titanic US Movie 52.92 8
Fox 1:00 P.M. - 2:00 P.M. US TV Episode 187.25 81
Fox Homecoming US TV Episode 128.45 58
Fox Best Laid Plans US TV Episode 53.76 24
Fox Best Laid Plans GB Movie 87.32 18
Fox Titanic GB Movie 78.84 16
Fox 1:00 P.M. - 2:00 P.M. GB TV Episode 61.42 28
Fox Homecoming GB TV Episode 60.22 28
Fox Best Laid Plans GB TV Episode 50.17 23
Paramount Titanic US Movie 139.84 16
Paramount Homecoming US Movie 124.84 16
Paramount Titanic GB Movie 41.92 8
Paramount Homecoming GB Movie 18.96 4
Sony Best Laid Plans US TV Episode 22.9 10
Sony Homecoming US TV Episode 22.9 10
Sony Best Laid Plans GB Movie 63.87 13
Sony Homecoming GB TV Episode 18.81 9
Sony Best Laid Plans GB TV Episode 4.57 3
The point here is - while BigQuery is extremely efficient in analyzing billions of rows and extracting needed info, It is quite ineficient to use BigQuery to actually tailor result data to reflect how this result will actually be presented in presentation layer on client UI. Instead - you should just pass this data to UI and have your visualization code to handle it
这篇关于在Google BigQuery中使用深度排序的通用数据透视表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!