在上一个问题中,给出了执行查询的答案,该查询将在新列中获取上一年的收入。这非常有效,但是现在我有一个后续问题。 (please review this link to have a look at the previous question)

用于获取此数据的查询(感谢Mikhail):

#standardSQL
SELECT
  a.date, a.location, a.revenue,
  DATE_SUB(a.date, INTERVAL 1 YEAR) date_last_year,
  IFNULL(b.revenue, 0) revenue_last_year
FROM `project.dataset.table` a
LEFT JOIN `project.dataset.table` b
ON a.location = b.location
AND DATE_SUB(a.date, INTERVAL 1 YEAR) = b.date


简化的结果如下所示(限于1个位置):

date        revenue     revenue_last_year
2019-01-31  1471,2577   2185,406
2019-01-30  1291,1111   4723,7439
2019-01-29  2178,6532   2263,5283
2019-01-28  1531,8021   0
2019-01-26  1578,1247   2446,6234
2019-01-25  1299,644    1522,4537
2019-01-24  788,2669    1979,104
2019-01-23  787,441     4117,7927
2019-01-22  2437,2951   1876,2479
2019-01-21  1071,0476   0
2019-01-19  2291,0456   2289,8657


后续问题与去年的工作日有关。如您所见,去年的收入值为“ 0”。那是因为位置A关闭的那天。但是,为了每天进行准确的比较,我们需要找到该营业日的营业额。

有关更多信息,请查看下表,以了解我们在今年一月的最后10天中的情况,并在两栏中添加手动找到的去年收入:

date        revenue     revenue_last_year   date        revenue
2019-01-31  1471,2577   2185,406            2018-01-31  2185,406
2019-01-30  1291,1111   4723,7439           2018-01-30  4723,7439
2019-01-29  2178,6532   2263,5283           2018-01-29  2263,5283
2019-01-28  1531,8021   0                   2018-01-27  2843,6616
2019-01-26  1578,1247   2446,6234           2018-01-26  2446,6234
2019-01-25  1299,644    1522,4537           2018-01-25  1522,4537
2019-01-24  788,2669    1979,104            2018-01-24  1979,104
2019-01-23  787,441     4117,7927           2018-01-23  4117,7927
2019-01-22  2437,2951   1876,2479           2018-01-22  1876,2479
2019-01-21  1071,0476   0                   2018-01-20  2561,4086
2019-01-19  2291,0456   2289,8657           2018-01-19  2289,8657


请注意日期的差异。

什么是解决此问题的好方法?是否有必要针对工作日的查询进行调整,您将如何处理?

最佳答案

以下是BigQuery标准SQL

#standardSQL
SELECT
  a.date, a.location, ANY_VALUE(a.revenue) revenue,
  ARRAY_AGG(
    STRUCT(b.date AS date_last_year, b.revenue AS revenue_last_year)
    ORDER BY b.date DESC LIMIT 1
  )[OFFSET(0)].*
FROM `project.dataset.table` a
CROSS JOIN `project.dataset.table` b
WHERE a.location = b.location
AND b.date BETWEEN DATE_SUB(DATE_SUB(a.date, INTERVAL 1 YEAR), INTERVAL 7 DAY) AND DATE_SUB(a.date, INTERVAL 1 YEAR)
GROUP BY a.date, a.location


您可以使用虚拟/样本数据(与上一个问题的答案相同)来测试,玩游戏,如以下示例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT DATE '2018-02-20' `date`, 'A' location, 1 revenue UNION ALL
  SELECT '2018-02-21', 'A', 3 UNION ALL
  SELECT '2019-02-20', 'A', 5 UNION ALL
  SELECT '2019-02-21', 'A', 7 UNION ALL
  SELECT '2019-02-22', 'A', 9 UNION ALL
  SELECT '2018-02-20', 'B', 2 UNION ALL
  SELECT '2018-02-22', 'B', 4 UNION ALL
  SELECT '2019-02-20', 'B', 6 UNION ALL
  SELECT '2019-02-21', 'B', 8 UNION ALL
  SELECT '2019-02-22', 'B', 10
)
SELECT
  a.date, a.location, ANY_VALUE(a.revenue) revenue,
  ARRAY_AGG(
    STRUCT(b.date AS date_last_year, b.revenue AS revenue_last_year)
    ORDER BY b.date DESC LIMIT 1
  )[OFFSET(0)].*
FROM `project.dataset.table` a
CROSS JOIN `project.dataset.table` b
WHERE a.location = b.location
AND b.date BETWEEN DATE_SUB(DATE_SUB(a.date, INTERVAL 1 YEAR), INTERVAL 7 DAY) AND DATE_SUB(a.date, INTERVAL 1 YEAR)
GROUP BY a.date, a.location
-- ORDER BY a.date, a.location


结果

Row date        location    revenue date_last_year  revenue_last_year
1   2019-02-20  A           5       2018-02-20      1
2   2019-02-20  B           6       2018-02-20      2
3   2019-02-21  A           7       2018-02-21      3
4   2019-02-21  B           8       2018-02-20      2
5   2019-02-22  A           9       2018-02-21      3
6   2019-02-22  B           10      2018-02-22      4


注意日期的差异:o)

10-05 19:40