28天滑动窗口聚合的BigQuery

28天滑动窗口聚合的BigQuery

本文介绍了28天滑动窗口聚合的BigQuery SQL(不用写28行SQL)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 LAG 函数在 BigQuery 中计算 28 天的移动总和.

I'm trying to compute a 28 day moving sum in BigQuery using the LAG function.

这个问题的最佳答案

用于滑动窗口聚合的 Bigquery SQL

来自 Felipe Hoffa 表示您可以使用 LAG 功能.一个例子是:

from Felipe Hoffa indicates that that you can use the LAG function. An example of this would be:

SELECT
    spend + spend_lagged_1day + spend_lagged_2day + spend_lagged_3day + ... +  spend_lagged_27day as spend_28_day_sum,
    user,
    date
FROM (
  SELECT spend,
         LAG(spend, 1) OVER (PARTITION BY user ORDER BY date) spend_lagged_1day,
         LAG(spend, 2) OVER (PARTITION BY user ORDER BY date) spend_lagged_2day,
         LAG(spend, 3) OVER (PARTITION BY user ORDER BY date) spend_lagged_3day,
         ...
         LAG(spend, 28) OVER (PARTITION BY user ORDER BY date) spend_lagged_day,
         user,
         date
  FROM user_spend
)

有没有办法不用写出 28 行 SQL 就可以做到这一点!

Is there a way to do this without having to write out 28 lines of SQL!

推荐答案

BigQuery 文档没有很好地解释该工具支持的窗口函数的复杂性,因为它没有指定在 ROWS 之后可以出现哪些表达式或范围.它实际上支持窗口函数的 SQL 2003 标准,您可以在网络上的其他地方找到相关文档,例如 这里.

The BigQuery documentation doesn't do a good job of explaining the complexity of window functions that the tool supports because it doesn't specify what expressions can appear after ROWS or RANGE. It actually supports the SQL 2003 standard for window functions, which you can find documented other places on the web, such as here.

这意味着您可以使用单个窗口函数获得您想要的效果.范围是 27,因为它是要包含在总和中的当前行之前的行数.

That means you can get the effect you want with a single window function. The range is 27 because it's how many rows before the current one to include in the sum.

SELECT spend,
       SUM(spend) OVER (PARTITION BY user ORDER BY date ROWS BETWEEN 27 PRECEDING AND CURRENT ROW),
       user,
       date
FROM user_spend;

范围界限也非常有用.如果您的表缺少某些用户的日期,那么 27 个 PRECEDING 行将返回超过 27 天,但 RANGE 将根据日期值本身生成一个窗口.在以下查询中,日期字段是 BigQuery TIMESTAMP,范围以微秒为单位指定.我建议,每当您在 BigQuery 中进行这样的约会数学运算时,都要对其进行彻底的测试,以确保它给出了预期的答案.

A RANGE bound can also be extremely useful. If your table was missing dates for some user, then 27 PRECEDING rows would go back more than 27 days, but RANGE will produce a window based on the date values themselves. In the following query, the date field is a BigQuery TIMESTAMP and the range is specified in microseconds. I'd advise that whenever you do date math like this in BigQuery, you test it thoroughly to make sure it's giving you the expected answer.

SELECT spend,
       SUM(spend) OVER (PARTITION BY user ORDER BY date RANGE BETWEEN 27 * 24 * 60 * 60 * 1000000 PRECEDING AND CURRENT ROW),
       user,
       date
FROM user_spend;

这篇关于28天滑动窗口聚合的BigQuery SQL(不用写28行SQL)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-27 20:53