问题描述
这是所附图片中的问题:
Here is the question in the image attached:
表格:
Row Col1 Col2 Col3 Result
1 10 20 100 30
2 20 40 200 60
3 30 60 0 240
4 40 70 0 180
5 30 80 50 110
6 25 35 0 65
7 10 20 60 30
因此,结果列是根据以下规则计算的:
So result column is calculated based on the below rules:
- 如果col3> 0,则结果= col1 + col2
- 如果col 3 = 0,则结果= sum(col2),直到col3> 0 + col1(其中col3> 0)
例如对于行= 3,结果= 60 + 70 + 80 + 30(来自第5行的col1,因为此处col3> 0)= 240对于第4行,结果= 70 + 80 + 30(来自第5行的col1,因为此处col3> 0)= 180对于其他人
for example for row =3, the result=60+70+80+30(from col1 from row 5 because here col3>0)=240for row=4, the result=70+80+30(from col1 from row 5 because here col3>0)=180similarly for others
推荐答案
此答案(正确地,我可能会添加)问题的原始版本.
This answers (correctly, I might add) the original version of the question.
在SQL中,您可以使用窗口函数来表达这一点.使用累积总和来定义组以及其他累积总和:
In SQL, you can express this using window functions. Use a cumulative sum to define the group and the an additional cumulative sum:
select t.*,
(case when col3 <> 0 then col1 + col2
else sum(col2 + case when col3 = 0 then col1 else 0 end) over (partition by grp order by row desc)
end) as result
from (select t.*,
sum(case when col3 <> 0 then 1 else 0 end) over (order by row desc) as grp
from t
) t;
此处是db< fiddle(使用Postgres).
Here is a db<>fiddle (which uses Postgres).
注意:
您的描述说else
逻辑应为:
else sum(col2) over (partition by grp order by row desc)
您的示例说:
else sum(col2 + col3) over (partition by grp order by row desc)
在我看来,这似乎是最合逻辑的:
And in my opinion, this seems most logical:
else sum(col1 + col2) over (partition by grp order by row desc)
这篇关于根据pyspark中的一些复杂逻辑来做一些列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!