问题描述
我发现了一个类似的问题(重复记录以填充Google BigQuery中的日期之间的空白)),但是使用不同的方案并且不存在应答。
+ ------- ----- + --------- + --------- + ------- +
|日期|产品|伙伴|值|
+ ------------ + --------- + --------- + ------- +
| 2017-01-01 | a | x | 10 |
| 2017-01-01 | b | x | 15 |
| 2017-01-01 | a | y | 11 |
| 2017-01-01 | b | y | 16 |
| 2017-01-05 | b | x | 13 |
| 2017-01-07 | a | y | 15 |
| 2017-01-07 | a | x | 15 |
+ ------------ + --------- + --------- + ------- +
我需要的是一个查询(专门用BigQuery标准SQL编写),给定日期范围(在这种情况下, 2017-01-01
至 2017-01-10
),输出以下结果:
+ -------------- + --------- + ------ --- + ------- +
|日期|产品|伙伴|值|
+ -------------- + --------- + --------- + ------- +
| 2017-01-01 | a | x | 10 |
| 2017-01-02 | a | x | 10 |
| 2017-01-03 | a | x | 10 |
| 2017-01-04 | a | x | 10 |
| 2017-01-05 | a | x | 10 |
| 2017-01-06 | a | x | 10 |
| 2017-01-07 | a | x | 15 |
| 2017-01-08 | a | x | 15 |
| 2017-01-09 | a | x | 15 |
| 2017-01-10 | a | x | 15 |
| 2017-01-01 | a | y | 11 |
| 2017-01-02 | a | y | 11 |
| 2017-01-03 | a | y | 11 |
| 2017-01-04 | a | y | 11 |
| 2017-01-05 | a | y | 11 |
| 2017-01-06 | a | y | 11 |
| 2017-01-07 | a | y | 15 |
| 2017-01-08 | a | y | 15 |
| 2017-01-09 | a | y | 15 |
| 2017-01-10 | a | y | 15 |
| 2017-01-01 | b | x | 15 |
| 2017-01-02 | b | x | 15 |
| 2017-01-03 | b | x | 15 |
| 2017-01-04 | b | x | 15 |
| 2017-01-05 | b | x | 13 |
| 2017-01-06 | b | x | 13 |
| 2017-01-07 | b | x | 13 |
| 2017-01-08 | b | x | 13 |
| 2017-01-09 | b | x | 13 |
| 2017-01-10 | b | x | 13 |
| 2017-01-01 | b | y | 16 |
| 2017-01-02 | b | y | 16 |
| 2017-01-03 | b | y | 16 |
| 2017-01-04 | b | y | 16 |
| 2017-01-05 | b | y | 16 |
| 2017-01-06 | b | y | 16 |
| 2017-01-07 | b | y | 16 |
| 2017-01-08 | b | y | 16 |
| 2017-01-09 | b | y | 16 |
| 2017-01-10 | b | y | 16 |
+ -------------- + --------- + --------- + ------- +
基本上,每个产品和合作伙伴的组合都包含所有日期差距的价格历史记录。
我很难弄清楚如何完成这项工作,特别是如何在没有发生价格变化的同一日期生成多行。任何想法?
#standardSQL
WITH历史AS(
SELECT'2017-01-01'AS d,'a'AS产品, 'x'AS伙伴,10 AS值UNION ALL
SELECT'2017-01-01'AS d,'b'AS产品,'x'AS伙伴,15 AS值UNION ALL
SELECT'2017 -01-01'AS d,'a'AS产品,'y'AS伙伴,11 AS值UNION ALL
SELECT'2017-01-01'AS d,'b'AS产品,'y'AS伙伴,16 AS值UNION ALL
SELECT'2017-01-05'AS d,'b'AS产品,'x'AS伙伴,13 AS值UNION ALL
SELECT'2017-01-07 'AS d,'a'AS产品,'y'AS伙伴,15 AS值UNION ALL
SELECT'2017-01-07'AS d,'a'AS产品,'x'AS伙伴,15 AS价值
),
daterange AS(
SELECT date_in_range
FROM UNNEST(GENERATE_DATE_ARRAY('2017-01-01','2017-01- 10'))AS date_in_range
),
temp AS(
SELECT d,product,partner,value,LEAD(d)OVER(PARTITION BY product,partner ORDER BY d)AS next_d
FROM历史
ORDER BY产品,合作伙伴,d
)
SELECT date_in_range,产品,合作伙伴,价值
FROM daterange
JOIN temp
ON daterange.date_in_range> = PARSE_DATE('%Y-%m-%d',temp.d)
AND(daterange.date_in_range< PARSE_DATE('%Y-%m-%d',temp.next_d)或temp.next_d IS NULL)
ORDER BY product,partner,date_in_range
I've found a similar question (Duplicating records to fill gap between dates in Google BigQuery), however with a different scenario and the answer there does not apply.
I have data structured like so (which is basically price-change history for multiple products and partners):
+------------+---------+---------+-------+
| date | product | partner | value |
+------------+---------+---------+-------+
| 2017-01-01 | a | x | 10 |
| 2017-01-01 | b | x | 15 |
| 2017-01-01 | a | y | 11 |
| 2017-01-01 | b | y | 16 |
| 2017-01-05 | b | x | 13 |
| 2017-01-07 | a | y | 15 |
| 2017-01-07 | a | x | 15 |
+------------+---------+---------+-------+
What I need is a query (specifically written in BigQuery Standard SQL) that, given a date range (in this case, 2017-01-01
to 2017-01-10
), outputs the following result:
+--------------+---------+---------+-------+
| date | product | partner | value |
+--------------+---------+---------+-------+
| 2017-01-01 | a | x | 10 |
| 2017-01-02 | a | x | 10 |
| 2017-01-03 | a | x | 10 |
| 2017-01-04 | a | x | 10 |
| 2017-01-05 | a | x | 10 |
| 2017-01-06 | a | x | 10 |
| 2017-01-07 | a | x | 15 |
| 2017-01-08 | a | x | 15 |
| 2017-01-09 | a | x | 15 |
| 2017-01-10 | a | x | 15 |
| 2017-01-01 | a | y | 11 |
| 2017-01-02 | a | y | 11 |
| 2017-01-03 | a | y | 11 |
| 2017-01-04 | a | y | 11 |
| 2017-01-05 | a | y | 11 |
| 2017-01-06 | a | y | 11 |
| 2017-01-07 | a | y | 15 |
| 2017-01-08 | a | y | 15 |
| 2017-01-09 | a | y | 15 |
| 2017-01-10 | a | y | 15 |
| 2017-01-01 | b | x | 15 |
| 2017-01-02 | b | x | 15 |
| 2017-01-03 | b | x | 15 |
| 2017-01-04 | b | x | 15 |
| 2017-01-05 | b | x | 13 |
| 2017-01-06 | b | x | 13 |
| 2017-01-07 | b | x | 13 |
| 2017-01-08 | b | x | 13 |
| 2017-01-09 | b | x | 13 |
| 2017-01-10 | b | x | 13 |
| 2017-01-01 | b | y | 16 |
| 2017-01-02 | b | y | 16 |
| 2017-01-03 | b | y | 16 |
| 2017-01-04 | b | y | 16 |
| 2017-01-05 | b | y | 16 |
| 2017-01-06 | b | y | 16 |
| 2017-01-07 | b | y | 16 |
| 2017-01-08 | b | y | 16 |
| 2017-01-09 | b | y | 16 |
| 2017-01-10 | b | y | 16 |
+--------------+---------+---------+-------+
Basically a price history with all date gaps filled, for every combination of product and partner.
I'm having a hard time figuring out how to get this done, especially how to generate multiple rows for the same date where no price change has happened. Any ideas?
Try below
#standardSQL
WITH history AS (
SELECT '2017-01-01' AS d, 'a' AS product, 'x' AS partner, 10 AS value UNION ALL
SELECT '2017-01-01' AS d, 'b' AS product, 'x' AS partner, 15 AS value UNION ALL
SELECT '2017-01-01' AS d, 'a' AS product, 'y' AS partner, 11 AS value UNION ALL
SELECT '2017-01-01' AS d, 'b' AS product, 'y' AS partner, 16 AS value UNION ALL
SELECT '2017-01-05' AS d, 'b' AS product, 'x' AS partner, 13 AS value UNION ALL
SELECT '2017-01-07' AS d, 'a' AS product, 'y' AS partner, 15 AS value UNION ALL
SELECT '2017-01-07' AS d, 'a' AS product, 'x' AS partner, 15 AS value
),
daterange AS (
SELECT date_in_range
FROM UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-01-10')) AS date_in_range
),
temp AS (
SELECT d, product, partner, value, LEAD(d) OVER(PARTITION BY product, partner ORDER BY d) AS next_d
FROM history
ORDER BY product, partner, d
)
SELECT date_in_range, product, partner, value
FROM daterange
JOIN temp
ON daterange.date_in_range >= PARSE_DATE('%Y-%m-%d', temp.d)
AND (daterange.date_in_range < PARSE_DATE('%Y-%m-%d', temp.next_d) OR temp.next_d IS NULL)
ORDER BY product, partner, date_in_range
这篇关于重复的记录组以填充Google BigQuery中的多个日期间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!