问题描述
我已经问过有关在X轴上进行多级聚合查询的先前问题:
我能够在第二个查询中构建Y查询-
使用AS(SELECT family_id来自`patents-public-data.patents.publications`GROUP BY family_id按计数排序(1)DESC极限2),BAS(选择family_id,优先日期,COUNT(1)priority_date_count来自`patents-public-data.patents.publications`WHERE family_id IN(从A中选择family_id)GROUP BY family_id,priority_date),C AS(选择family_id,优先日期,priority_date_count,DENSE_RANK()OVER(PARTITION BY family_id ORDER BY priority_date_count DESC)AS priority_date_rank从B)选择family_id,优先日期,priority_date_count来自C在哪里priority_date_rank< = 2
但是,我不太确定如何在单个查询或两个查询中将它们合并在一起.
下面是BigQuery标准SQL的演示,仅是该方法的演示,并不假装100%表示请求的逻辑
与A_X AS(从`patents-public-data.patents.publications`中选择country_codeGROUP BY country_code ORDER BY COUNT(1)DESC LIMIT 2),B_X AS(SELECT country_code,application_kind,COUNT(1)application_kind_count从`patents-public-data.patents.publications`中的country_code输入(从A_X中选择country_code)GROUP BY国家/地区代码,application_kind),C_X AS(SELECT country_code,application_kind,application_kind_count,DENSE_RANK()OVER(按国家(地区)代码排序或按application_kind_count DESC排序)AS application_kind_rank来自B_X),X AS(选择国家/地区代码,application_kind,application_kind_count从C_X WHERE application_kind_rank< = 2),A_Y AS(从`patents-public-data.patents.publications`中选择family_id加入X使用(country_code,application_kind)GROUP BY family_id按计数排序(1)上限2),B_Y AS(SELECT family_id,priority_date,COUNT(1)priority_date_count从`patents-public-data.patents.publications`中的family_id输入(从A_Y选择SELECT family_id)GROUP BY family_id,priority_date),C_Y AS(SELECT family_id,priority_date,priority_date_count,DENSE_RANK()OVER(PARTITION BY family_id ORDER BY priority_date_count DESC)AS pos_date来自B_Y),Y AS(SELECT family_id,priority_date,pos_date,DENSE_RANK()OVER(ORDER BY family_id)pos_family从C_Y WHERE pos_date< = 2)SELECT country_code,application_kind,COUNTIF(pos_family = 1 AND pos_date = 1)`family1_date1`,COUNTIF(pos_family = 1 AND pos_date = 2)`family1_date2`,COUNTIF(pos_family = 2 AND pos_date = 1)`family2_date1`,COUNTIF(pos_family = 2 AND pos_date = 2)`family2_date2`来自`patents-public-data.patents.publications`加入您的使用情况(family_id,priority_date)WHERE country_code IN(从X中选择country_code)AND application_kind IN(从x中选择application_kind)GROUP BY国家/地区代码,application_kind
结果是
很明显,由于相交逻辑,上面有零个数字
I have asked a previous question about doing a multi-level aggregation query on the X-axis here: Get the top patent countries, codes in a BQ public dataset.
Here is how the query (copied from the accepted answer works) to get:
Top 2 Countries by Count, and within those countries, top 2 Codes by Count
WITH A AS (
SELECT country_code
FROM `patents-public-data.patents.publications`
GROUP BY country_code
ORDER BY COUNT(1) DESC
LIMIT 2
), B AS (
SELECT
country_code,
application_kind,
COUNT(1) application_kind_count
FROM `patents-public-data.patents.publications`
WHERE country_code IN (SELECT country_code FROM A)
GROUP BY country_code, application_kind
), C AS (
SELECT
country_code,
application_kind,
application_kind_count,
DENSE_RANK() OVER(PARTITION BY country_code ORDER BY application_kind_count DESC) AS application_kind_rank
FROM B
)
SELECT
country_code,
application_kind,
application_kind_count
FROM C
WHERE application_kind_rank <= 2
And I get something like:
country_code application_kind count
JP A 125
JP U 124
CN A 118
CN U 101
Now I would like to add the following pivot on the y-axis: to get the following:
- X: Top 2 Countries by Count, and within those countries, top 2 Codes by Count
- Y: Top 2 family_id by Count, Top 2 priority_date by Count
The final results would then look like:
I am able to build the Y-query in a second query --
WITH A AS (
SELECT family_id
FROM `patents-public-data.patents.publications`
GROUP BY family_id
ORDER BY COUNT(1) DESC
LIMIT 2
), B AS (
SELECT
family_id,
priority_date,
COUNT(1) priority_date_count
FROM `patents-public-data.patents.publications`
WHERE family_id IN (SELECT family_id FROM A)
GROUP BY family_id, priority_date
), C AS (
SELECT
family_id,
priority_date,
priority_date_count,
DENSE_RANK() OVER(PARTITION BY family_id ORDER BY priority_date_count DESC) AS priority_date_rank
FROM B
)
SELECT
family_id,
priority_date,
priority_date_count
FROM C
WHERE priority_date_rank <= 2
However, I am not quite sure how to merge them together, in a single query or in two.
Below is for BigQuery Standard SQL and is just demo of the approach and not pretending to be 100% representing requested logic
WITH A_X AS (
SELECT country_code FROM `patents-public-data.patents.publications`
GROUP BY country_code ORDER BY COUNT(1) DESC LIMIT 2
), B_X AS (
SELECT country_code, application_kind, COUNT(1) application_kind_count
FROM `patents-public-data.patents.publications` WHERE country_code IN (SELECT country_code FROM A_X)
GROUP BY country_code, application_kind
), C_X AS (
SELECT country_code, application_kind, application_kind_count,
DENSE_RANK() OVER(PARTITION BY country_code ORDER BY application_kind_count DESC) AS application_kind_rank
FROM B_X
), X AS (
SELECT country_code, application_kind, application_kind_count
FROM C_X WHERE application_kind_rank <= 2
), A_Y AS (
SELECT family_id FROM `patents-public-data.patents.publications`
JOIN X USING(country_code, application_kind)
GROUP BY family_id
ORDER BY COUNT(1) DESC LIMIT 2
), B_Y AS (
SELECT family_id, priority_date, COUNT(1) priority_date_count
FROM `patents-public-data.patents.publications` WHERE family_id IN (SELECT family_id FROM A_Y)
GROUP BY family_id, priority_date
), C_Y AS (
SELECT family_id, priority_date, priority_date_count,
DENSE_RANK() OVER(PARTITION BY family_id ORDER BY priority_date_count DESC) AS pos_date
FROM B_Y
), Y AS (
SELECT family_id, priority_date, pos_date, DENSE_RANK() OVER(ORDER BY family_id) pos_family
FROM C_Y WHERE pos_date <= 2
)
SELECT country_code, application_kind,
COUNTIF(pos_family = 1 AND pos_date = 1) `family1_date1`,
COUNTIF(pos_family = 1 AND pos_date = 2) `family1_date2`,
COUNTIF(pos_family = 2 AND pos_date = 1) `family2_date1`,
COUNTIF(pos_family = 2 AND pos_date = 2) `family2_date2`
FROM `patents-public-data.patents.publications`
JOIN Y USING(family_id, priority_date)
WHERE country_code IN (SELECT country_code FROM X)
AND application_kind IN (SELECT application_kind FROM x)
GROUP BY country_code, application_kind
the result is
Obviously, there are number of zeroes above because of intersection logic
这篇关于在Google BigQuery中做一个交叉枢纽的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!