本文介绍了在BigQuery中的联接表上,使用Group By子句获取“最新行"的最有效查询是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
当前设计
Table: 1_notes
------------------------------------------
| id | text | created_at |
------------------------------------------
| 1_1 | u1 first note | 2018-01-01 10:00:00 |
| 1_2 | u1 second note | 2018-01-03 10:00:00 |
Table: 1_note_timeline
---------------------------------------------------------------------
| note_id | note_created_at | likes_count | created_at |
---------------------------------------------------------------------
| 1_1 | 2018-01-01 10:00:00 | 10 | 2018-01-01 10:00:00 |
| 1_1 | 2018-01-01 10:00:00 | 20 | 2018-01-02 10:00:00 |
| 1_2 | 2018-01-03 10:00:00 | 10 | 2018-01-03 10:00:00 |
| 1_1 | 2018-01-01 10:00:00 | 15 | 2018-01-03 10:00:00 |
Table: 2_notes
--------------------------------------------
| id | text | created_at |
--------------------------------------------
| 2_1 | u2 first note | 2018-01-01 10:00:00 |
| 2_2 | u2 second note | 2018-01-03 10:00:00 |
Table: 2_note_timeline
---------------------------------------------------------------------
| note_id | note_created_at | likes_count | created_at |
---------------------------------------------------------------------
| 2_1 | 2018-01-01 10:00:00 | 10 | 2018-01-01 10:00:00 |
| 2_1 | 2018-01-01 10:00:00 | 20 | 2018-01-02 10:00:00 |
| 2_2 | 2018-01-03 10:00:00 | 10 | 2018-01-03 10:00:00 |
| 2_1 | 2018-01-01 10:00:00 | 15 | 2018-01-03 10:00:00 |
对于每个用户,其笔记数据有2个表.
ForEach user there are 2 tables for their notes data.
- {{userId}} _ notes表包含ID为
userId
的用户的注释 - {{userId}} _ note_timeline表每天跟踪记录的点赞"次数
所需结果应具有:
- 最喜欢的两个用户的前2个笔记
- 点数必须是最新的(并且不能为MAX_VALUE,因为likes_count的注释会随着时间减少)
最终输出
Output
---------------------------------------------------------------
| note_id | note_created_at | likes_count | text |
---------------------------------------------------------------
| 1_1 | 2018-01-01 10:00:00 | 15 | u1 first note |
| 2_1 | 2018-01-01 10:00:00 | 15 | u2 first note |
推荐答案
以下是BigQuery标准SQL
Below is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.user_notes` AS (
SELECT * FROM `project.dataset.user1_notes` UNION ALL
SELECT * FROM `project.dataset.user2_notes`
), `project.dataset.user_note_timeline` AS (
SELECT * FROM `project.dataset.user1_note_timeline` UNION ALL
SELECT * FROM `project.dataset.user2_note_timeline`
)
SELECT note_id, note_created_at, likes_count, text
FROM (
SELECT note_id, ARRAY_AGG(STRUCT(note_created_at, likes_count, created_at) ORDER BY created_at DESC LIMIT 1)[OFFSET(0)].*
FROM `project.dataset.user_note_timeline`
GROUP BY note_id
ORDER BY likes_count DESC, note_created_at
LIMIT 2
) t
JOIN `project.dataset.user_notes` n
ON note_id = id
这篇关于在BigQuery中的联接表上,使用Group By子句获取“最新行"的最有效查询是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!