本文介绍了使用Window功能将前一行与当前行相加的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个spark数据框,在这里我想根据groupid和id根据当前行的Amount值和上一行的Amount值求和.让我把df放出来

I have a spark dataframe where, I want to calculate a running total based on current row Amount value and Previous row sum of Amount value based on groupid and id. Let me put out the df

import findspark
findspark.init()
import pyspark 
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
import pandas as pd


 sc = spark.sparkContext
data1 = {'date': {0: '2018-04-03', 1: '2018-04-04', 2: '2018-04-05', 3: '2018-04-06', 4: '2018-04-07'},
         'id': {0: 'id1', 1: 'id2', 2: 'id1', 3: 'id3', 4: 'id2'},
         'group': {0: '1', 1: '1', 2: '1', 3: '2', 4: '1'},
         'amount': {0: 50, 1: 40, 2: 50, 3: 55, 4: 20}}
df1_pd = pd.DataFrame(data1, columns=data1.keys())

df1 = spark.createDataFrame(df1_pd)
df1.show()


+----------+---+-----+------+
|      date| id|group|amount|
+----------+---+-----+------+
|2018-04-03|id1|    1|    50|
|2018-04-04|id2|    1|    40|
|2018-04-05|id1|    1|    50|
|2018-04-06|id3|    2|    55|
|2018-04-07|id2|    1|    20|
+----------+---+-----+------+

我正在寻找的投入品

+----------+---+-----+------+---+
|      date| id|group|amount|sum|
+----------+---+-----+------+---+
|2018-04-03|id1|    1|    50|50 |
|2018-04-04|id2|    1|    40|90 |
|2018-04-05|id1|    1|    50|140|
|2018-04-06|id3|    2|    55|55 |
|2018-04-07|id2|    1|    20|160|
+----------+---+-----+------+---+

推荐答案

窗口定义:

from pyspark.sql.window import Window
from pyspark.sql.functions import sum

w = Window.partitionBy("group").orderBy("date").rowsBetween(
    Window.unboundedPreceding,  # Take all rows from the beginning of frame
    Window.currentRow           # To current row
)

总和:

(df1.withColumn("sum", sum("amount").over(w))
    .orderBy("date")   # Sort for easy inspection. Not necessary
    .show())

结果:

+----------+---+-----+------+---+      
|      date| id|group|amount|sum|
+----------+---+-----+------+---+
|2018-04-03|id1|    1|    50| 50|
|2018-04-04|id2|    1|    40| 90|
|2018-04-05|id1|    1|    50|140|
|2018-04-06|id3|    2|    55| 55|
|2018-04-07|id2|    1|    20|160|
+----------+---+-----+------+---+

这篇关于使用Window功能将前一行与当前行相加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 08:09