本文介绍了总计QUERY-每月每人平均3个月的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我在bigquery中有数据视图

Suppose I have view of data in bigquery

Person | Amount | yearMonth
---------------------------
AA     |   100  |   201701
AA     |   200  |   201702
AA     |   300  |   201703
AA     |   70   |   201704
AB     |   10   |   201701
AB     |   50   |   201702
AB     |   60   |   201703
AB     |   70   |   201704
AC     |   70   |   201701
AC     |   80   |   201702
AC     |   30   |   201703
AC     |   10   |   201704

现在,我需要获取每个人每个月最近3个月的平均值

Now, I need to get the average of this for the last 3 months for every person every month

预期结果:

Person | Amount | yearMonth
---------------------------
AA     |   200  |   201703(avg of 201701-201703)
AA     |   190  |   201704(avg of 201702-201704)
AB     |   40   |   201703(avg of 201701-201703)
AB     |   60   |   201704(avg of 201702-201704)
AC     |   60   |   201703(avg of 201701-201703)
AC     |   40   |   201704(avg of 201702-201704)

这是怎么计算的?

第一行


  • AA = 200,来自100(201701)+200(201702)+300(201703) )/ 3 = 200

  • AA = 100,来自200(201702)+300(201703)+70(201704)/ 3 = 190

  • AB = 40,来自10(201701)+50(201702)+60(201703)/ 3 = 40

  • 不久之后

  • AA = 200, comes from 100(201701)+200(201702)+300(201703)/3 = 200
  • AA = 100, comes from 200(201702)+300(201703)+70(201704)/3 = 190
  • AB = 40, comes from 10(201701)+50(201702)+60(201703)/3 = 40
  • and soon

我不太确定该如何分组。我不介意您的答案是否有与此问题的链接。

I am not really sure how to group by this. I don't mind if your answer has a link to this issue.

感谢堆积物

也是可能在旧版SQL中?我还没有迁移到standardSQL。
我的观点是在旧版SQL中

Is It also possible in legacy SQL? I haven't migrated to standardSQL yet.my view is in legacy SQL

推荐答案

下面是针对BigQuery标准SQL的建议(至少应该给您一个关于正确分组的逻辑)

Below is for BigQuery Standard SQL (at least should give you an idea on the logic of proper grouping)

#standardSQL
SELECT
  person, yearMonth, CAST(amount AS INT64) amount
FROM (
  SELECT
    person, yearMonth, dt,
    AVG(amount) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) amount,
    COUNT(1) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) months
  FROM (
    SELECT
      person, amount, yearMonth,
      UNIX_DATE(DATE(DIV(yearMonth, 100), MOD(yearMonth, 100), 1)) AS dt
    FROM `project.dataset.table`
  )
)
WHERE months = 3
-- ORDER BY person, yearMonth

您可以使用以下虚拟数据测试/玩游戏

You can test / play with it with dummy data as below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'AA' person, 100 amount, 201701 yearMonth UNION ALL
  SELECT 'AA', 200, 201702 UNION ALL
  SELECT 'AA', 300, 201703 UNION ALL
  SELECT 'AA', 70, 201704 UNION ALL
  SELECT 'AB', 10, 201701 UNION ALL
  SELECT 'AB', 50, 201702 UNION ALL
  SELECT 'AB', 60, 201703 UNION ALL
  SELECT 'AB', 70, 201704 UNION ALL
  SELECT 'AC', 70, 201701 UNION ALL
  SELECT 'AC', 80, 201702 UNION ALL
  SELECT 'AC', 30, 201703 UNION ALL
  SELECT 'AC', 10, 201704
)
SELECT
  person, yearMonth, CAST(amount AS INT64) amount
FROM (
  SELECT
    person, yearMonth, dt,
    AVG(amount) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) amount,
    COUNT(1) OVER(PARTITION BY person ORDER BY dt RANGE BETWEEN 63 PRECEDING AND CURRENT row) months
  FROM (
    SELECT
      person, amount, yearMonth,
      UNIX_DATE(DATE(DIV(yearMonth, 100), MOD(yearMonth, 100), 1)) AS dt
    FROM `project.dataset.table`
  )
)
WHERE months = 3
ORDER BY person, yearMonth

输出为预期

person  yearMonth   amount
AA      201703      200
AA      201704      190
AB      201703      40
AB      201704      60
AC      201703      60
AC      201704      40





#legacySQL
SELECT
  person, yearMonth, INTEGER(amount) amount
FROM (
  SELECT
    person, yearMonth, dt,
    AVG(amount) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) amount,
    COUNT(1) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) months
  FROM (
    SELECT
      person, amount, yearMonth,
      TIMESTAMP_TO_SEC(TIMESTAMP(CONCAT(STRING(INTEGER(yearMonth/100)), '-', SUBSTR(STRING(100 + yearMonth % 100), 2, 2), '-01'))) AS dt
    FROM [project:dataset.table]
  )
)
WHERE months = 3
-- ORDER BY person, yearMonth

您可以使用下面的示例对虚拟数据进行测试/使用

You can test / play with it using below example with dummy data

#legacySQL
SELECT
  person, yearMonth, INTEGER(amount) amount
FROM (
  SELECT
    person, yearMonth, dt,
    AVG(amount) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) amount,
    COUNT(1) OVER(PARTITION BY person ORDER BY dt range BETWEEN 63*60*60*24 preceding AND current row) months
  FROM (
    SELECT
      person, amount, yearMonth,
      TIMESTAMP_TO_SEC(TIMESTAMP(CONCAT(STRING(INTEGER(yearMonth/100)), '-', SUBSTR(STRING(100 + yearMonth % 100), 2, 2), '-01'))) AS dt
    FROM -- [project:dataset.table]
      (SELECT 'AA' person, 100 amount, 201701 yearMonth),
      (SELECT 'AA' person, 200 amount, 201702 yearMonth),
      (SELECT 'AA' person, 300 amount, 201703 yearMonth),
      (SELECT 'AA' person, 70 amount, 201704 yearMonth),
      (SELECT 'AB' person, 10 amount, 201701 yearMonth),
      (SELECT 'AB' person, 50 amount, 201702 yearMonth),
      (SELECT 'AB' person, 60 amount, 201703 yearMonth),
      (SELECT 'AB' person, 70 amount, 201704 yearMonth),
      (SELECT 'AC' person, 70 amount, 201701 yearMonth),
      (SELECT 'AC' person, 80 amount, 201702 yearMonth),
      (SELECT 'AC' person, 30 amount, 201703 yearMonth),
      (SELECT 'AC' person, 10 amount, 201704 yearMonth)
  )
)
WHERE months = 3
ORDER BY person, yearMonth

这篇关于总计QUERY-每月每人平均3个月的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-01 17:06