使用Python的Pandas,可以一次完成多个列的批量操作,如下所示:
# assuming we have a DataFrame with, among others, the following columns
cols = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8']
df[cols] = df[cols] / df['another_column']
在Scala中使用Spark是否有类似的功能?
目前,我最终要做的是:
val df2 = df.withColumn("col1", $"col1" / $"another_column")
.withColumn("col2", $"col2" / $"another_column")
.withColumn("col3", $"col3" / $"another_column")
.withColumn("col4", $"col4" / $"another_column")
.withColumn("col5", $"col5" / $"another_column")
.withColumn("col6", $"col6" / $"another_column")
.withColumn("col7", $"col7" / $"another_column")
.withColumn("col8", $"col8" / $"another_column")
最佳答案
您可以使用foldLeft
处理列列表,如下所示:
val df = Seq(
(1, 20, 30, 4),
(2, 30, 40, 5),
(3, 10, 30, 2)
).toDF("id", "col1", "col2", "another_column")
val cols = Array("col1", "col2")
val df2 = cols.foldLeft( df )( (acc, c) =>
acc.withColumn( c, df(c) / df("another_column") )
)
df2.show
+---+----+----+--------------+
| id|col1|col2|another_column|
+---+----+----+--------------+
| 1| 5.0| 7.5| 4|
| 2| 6.0| 8.0| 5|
| 3| 5.0|15.0| 2|
+---+----+----+--------------+