我有以下客户sku级数据集

Customer customer_name  customer_category sku sku_name week   revenue
       1 abc            dsas              221 test     201701     100
       1 abc            dsas              221 test     201704      90
       1 abc            dsas              222 new      201701     100


我需要根据上面一周的最大值和最小值创建行。

应该为新行重新填充客户属性和sku属性(customer_name,customer_category,sku,sku_name),但新行的收入应为0。

数据处于“客户库存周”级别。

输出如下所示:

Customer customer_name customer_category sku sku_name week   revenue
       1 abc           dsas              221 test     201701 100*total 4 rows
       1 abc           dsas              221 test     201702   0
       1 abc           dsas              221 test     201703   0
       1 abc           dsas              221 test     201704  90
       1 abc           dsas              222 new      201701 100*total 1 row


我试图将上面的表与虚拟表正确地连接起来,并且所有星期值都可能出现。

我想知道如何重新填充新行的客户和sku属性,同时仍将收入保持为0

最佳答案

在使用DataFrame API的Spark SQL中,可以使用以下步骤完成此操作:


使用窗口函数lead获取下一周
使用sequence函数创建缺少的几周数组
使用explode_outer分解数组
将另一个窗口功能row_numberwhen条件一起使用,可将缺少的几周的收入col设置为0。


假设数据仅在一年之内(否则必须调整顺序),则查询可能如下所示(在PySpark中):

from pyspark.sql.functions import lead, row_number, col, sequence, expr,  explode_outer, coalesce, when
from pyspark.sql import Window

w = Window.partitionBy('customer', 'sku').orderBy('week')
w2 = Window.partitionBy('customer', 'sku', 'week').orderBy('week_2')

result = (
  df
  .withColumn('next_week', lead('week').over(w))
  .withColumn('missed', expr("sequence(week, next_week-1, 1)"))
  .withColumn('week_2', explode_outer('missed'))
  .withColumn('week_3', coalesce('week_2', 'week'))
  .withColumn('r', row_number().over(w2))
  .withColumn('revenue_2', when(~(col('r') == 1), 0).otherwise(col('revenue')))
  .select(
    'customer', 'customer_name', 'customer_category', 'sku', 'sku_name',
    col('week_3').alias('week'),
    col('revenue_2').alias('revenue')
  )
)

result.show()
+--------+-------------+-----------------+---+--------+------+-------+
|customer|customer_name|customer_category|sku|sku_name|week  |revenue|
+--------+-------------+-----------------+---+--------+------+-------+
|1       |abc          |dsas             |221|test    |201701|100    |
|1       |abc          |dsas             |221|test    |201702|0      |
|1       |abc          |dsas             |221|test    |201703|0      |
|1       |abc          |dsas             |221|test    |201704|90     |
|1       |abc          |dsas             |222|new     |201701|100    |
+--------+-------------+-----------------+---+--------+------+-------+


我希望这是您所需要的。

关于mysql - 在SQL中的列值之间创建行,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/57712056/

10-13 22:49