问题描述
假设我在bigquery中有数据视图
Suppose I have view of data in bigquery
Person | Amount | yearMonth
---------------------------
AA | 100 | 201701
AA | 200 | 201702
AA | 300 | 201703
AA | 70 | 201704
AB | 10 | 201701
AB | 50 | 201702
AB | 60 | 201703
AB | 70 | 201704
AC | 70 | 201701
AC | 80 | 201702
AC | 30 | 201703
AC | 10 | 201704
现在,我需要获取每个人每个月最近3个月的平均值
Now, I need to get the average of this for the last 3 months for every person every month
预期结果:
Person | Amount | yearMonth
---------------------------
AA | 200 | 201703(avg of 201701-201703)
AA | 190 | 201704(avg of 201702-201704)
AB | 40 | 201703(avg of 201701-201703)
AB | 60 | 201704(avg of 201702-201704)
AC | 60 | 201703(avg of 201701-201703)
AC | 40 | 201704(avg of 201702-201704)
这是怎么计算的?
第一行
- AA = 200,来自100(201701)+200(201702)+300(201703) )/ 3 = 200
- AA = 100,来自200(201702)+300(201703)+70(201704)/ 3 = 190
- AB = 40,来自10(201701)+50(201702)+60(201703)/ 3 = 40
- 不久之后
- AA = 200, comes from 100(201701)+200(201702)+300(201703)/3 = 200
- AA = 100, comes from 200(201702)+300(201703)+70(201704)/3 = 190
- AB = 40, comes from 10(201701)+50(201702)+60(201703)/3 = 40
- and soon
我不太确定该如何分组。我不介意您的答案是否有与此问题的链接。
I am not really sure how to group by this. I don't mind if your answer has a link to this issue.
感谢堆积物
也是可能在旧版SQL中?我还没有迁移到standardSQL。
我的观点是在旧版SQL中
Is It also possible in legacy SQL? I haven't migrated to standardSQL yet.my view is in legacy SQL
推荐答案
下面是针对BigQuery标准SQL的建议(至少应该给您一个关于正确分组的逻辑)
Below is for BigQuery Standard SQL (at least should give you an idea on the logic of proper grouping)
#standardSQL
SELECT
person, yearMonth, CAST(amount AS INT64) amount
FROM (
SELECT
person, yearMonth, dt,
AVG(amount) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) amount,
COUNT(1) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) months
FROM (
SELECT
person, amount, yearMonth,
UNIX_DATE(DATE(DIV(yearMonth, 100), MOD(yearMonth, 100), 1)) AS dt
FROM `project.dataset.table`
)
)
WHERE months = 3
-- ORDER BY person, yearMonth
您可以使用以下虚拟数据测试/玩游戏
You can test / play with it with dummy data as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'AA' person, 100 amount, 201701 yearMonth UNION ALL
SELECT 'AA', 200, 201702 UNION ALL
SELECT 'AA', 300, 201703 UNION ALL
SELECT 'AA', 70, 201704 UNION ALL
SELECT 'AB', 10, 201701 UNION ALL
SELECT 'AB', 50, 201702 UNION ALL
SELECT 'AB', 60, 201703 UNION ALL
SELECT 'AB', 70, 201704 UNION ALL
SELECT 'AC', 70, 201701 UNION ALL
SELECT 'AC', 80, 201702 UNION ALL
SELECT 'AC', 30, 201703 UNION ALL
SELECT 'AC', 10, 201704
)
SELECT
person, yearMonth, CAST(amount AS INT64) amount
FROM (
SELECT
person, yearMonth, dt,
AVG(amount) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) amount,
COUNT(1) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) months
FROM (
SELECT
person, amount, yearMonth,
UNIX_DATE(DATE(DIV(yearMonth, 100), MOD(yearMonth, 100), 1)) AS dt
FROM `project.dataset.table`
)
)
WHERE months = 3
ORDER BY person, yearMonth
输出为预期
person yearMonth amount
AA 201703 200
AA 201704 190
AB 201703 40
AB 201704 60
AC 201703 60
AC 201704 40
#legacySQL
SELECT
person, yearMonth, INTEGER(amount) amount
FROM (
SELECT
person, yearMonth, dt,
AVG(amount) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) amount,
COUNT(1) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) months
FROM (
SELECT
person, amount, yearMonth,
TIMESTAMP_TO_SEC(TIMESTAMP(CONCAT(STRING(INTEGER(yearMonth/100)), '-', SUBSTR(STRING(100 + yearMonth % 100), 2, 2), '-01'))) AS dt
FROM [project:dataset.table]
)
)
WHERE months = 3
-- ORDER BY person, yearMonth
您可以使用下面的示例对虚拟数据进行测试/使用
You can test / play with it using below example with dummy data
#legacySQL
SELECT
person, yearMonth, INTEGER(amount) amount
FROM (
SELECT
person, yearMonth, dt,
AVG(amount) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) amount,
COUNT(1) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) months
FROM (
SELECT
person, amount, yearMonth,
TIMESTAMP_TO_SEC(TIMESTAMP(CONCAT(STRING(INTEGER(yearMonth/100)), '-', SUBSTR(STRING(100 + yearMonth % 100), 2, 2), '-01'))) AS dt
FROM -- [project:dataset.table]
(SELECT 'AA' person, 100 amount, 201701 yearMonth),
(SELECT 'AA' person, 200 amount, 201702 yearMonth),
(SELECT 'AA' person, 300 amount, 201703 yearMonth),
(SELECT 'AA' person, 70 amount, 201704 yearMonth),
(SELECT 'AB' person, 10 amount, 201701 yearMonth),
(SELECT 'AB' person, 50 amount, 201702 yearMonth),
(SELECT 'AB' person, 60 amount, 201703 yearMonth),
(SELECT 'AB' person, 70 amount, 201704 yearMonth),
(SELECT 'AC' person, 70 amount, 201701 yearMonth),
(SELECT 'AC' person, 80 amount, 201702 yearMonth),
(SELECT 'AC' person, 30 amount, 201703 yearMonth),
(SELECT 'AC' person, 10 amount, 201704 yearMonth)
)
)
WHERE months = 3
ORDER BY person, yearMonth
这篇关于总计QUERY-每月每人平均3个月的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!