问题描述
这是附图中的问题:
表格:
Row Col1 Col2 Col3 Result
1 10 20 100 30
2 20 40 200 60
3 30 60 0 240
4 40 70 0 180
5 30 80 50 110
6 25 35 0 65
7 10 20 60 30
因此结果列是根据以下规则计算的:
So result column is calculated based on the below rules:
- 如果 col3 > 0 ,则 result=col1+col2
- 如果 col 3=0,则 result= sum (col2) 直到 col3 >0 + col1(where col3>0)
例如对于第=3行,结果=60+70+80+30(来自第5行的col1,因为这里col3>0)=240对于 row=4,结果=70+80+30(来自第 5 行的 col1,因为这里 col3>0)=180其他人也一样
for example for row =3, the result=60+70+80+30(from col1 from row 5 because here col3>0)=240for row=4, the result=70+80+30(from col1 from row 5 because here col3>0)=180similarly for others
推荐答案
这回答(正确,我可能会添加)问题的原始版本.
This answers (correctly, I might add) the original version of the question.
在 SQL 中,您可以使用窗口函数来表达这一点.使用累积总和来定义组和附加累积总和:
In SQL, you can express this using window functions. Use a cumulative sum to define the group and the an additional cumulative sum:
select t.*,
(case when col3 <> 0 then col1 + col2
else sum(col2 + case when col3 = 0 then col1 else 0 end) over (partition by grp order by row desc)
end) as result
from (select t.*,
sum(case when col3 <> 0 then 1 else 0 end) over (order by row desc) as grp
from t
) t;
这里是一个db<>fiddle(使用Postgres).
Here is a db<>fiddle (which uses Postgres).
注意:
您的描述说 else
逻辑应该是:
Your description says that the else
logic should be:
else sum(col2) over (partition by grp order by row desc)
你的例子说:
else sum(col2 + col3) over (partition by grp order by row desc)
在我看来,这似乎是最合乎逻辑的:
And in my opinion, this seems most logical:
else sum(col1 + col2) over (partition by grp order by row desc)
这篇关于根据pyspark中的一些复杂逻辑做一些列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!