问题描述
输入:
item loc qty year month
A IND 10 2019 13
A IND 20 2020 1
A IND 10 2020 2
A IND 40 2020 3
A IND 50 2020 5
A IND 10 2020 6
输出:
item loc sum(qty) year month
A IND 0 2019 13
A IND 10 2020 1
A IND 30 2020 2
A IND 40 2020 3
A IND 50 2020 5
A IND 90 2020 6
说明:
我将如何得到我的输出如下:
how will i get my output is as follows:
如果我计算的是 2020 年和第 3 个月,那么我需要考虑(第 3 个月)和(第 1 个月)之间的总和(数量),即在这种情况下,它将是从 2019 年第 12 个月到 2020 年和第 2 个月
if i am calculationg for year 2020 and month 3 then i need to consider the sum(qty) between (month-3) and (month-1) i.e. in this case it will be from year 2019 month 12 to year 2020 and month 2
因此对于 2020 年和第 3 个月,输出将为 sum(qty)=10+20+10=40
so for year 2020 and month 3 the ouput will be sum(qty)=10+20+10=40
现在是 2020 年和第 6 个月
now for year 2020 and month 6
sum(qty) 将在 2020 年和月份 -3=3 和 2020 年和月份-1=5 之间
sum(qty) will be between year 2020 and month -3=3 and year 2020 and month-1=5
so sum(qty)=0(0 表示第 4 个月不在表中)+40+50=90
so sum(qty)=0(0 for month 4 which is not in the table)+40+50=90
推荐答案
试试这个.
df.createOrReplaceTempView("test")
spark.sql("""
SELECT
item,
loc,
COALESCE(
SUM(qty) OVER (
PARTITION BY item
ORDER BY (year - 2000) * 13 + month
RANGE BETWEEN 3 PRECEDING AND 1 PRECEDING
), 0) as sum_qty,
year,
month
FROM
test
""").show
+----+---+-------+----+-----+
|item|loc|sum_qty|year|month|
+----+---+-------+----+-----+
| A|IND| 0|2019| 13|
| A|IND| 10|2020| 1|
| A|IND| 30|2020| 2|
| A|IND| 40|2020| 3|
| A|IND| 50|2020| 5|
| A|IND| 90|2020| 6|
+----+---+-------+----+-----+
这篇关于在sparksql中为每年和每月做累积总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!