我将两个PySpark数据帧连接如下:

exprs = [max(x) for x in ["col1","col2"]]
df = df1.union(df2).groupBy(['campk', 'ppk']).agg(*exprs)

但我有个错误:
AssertionError: all exprs should be Column

怎么了?

最佳答案

exprs = [max(x) for x in ["col1","col2"]]

将返回最大ascii值为ie['o', 'o']的字符
引用正确的max将起作用:
>>> from pyspark.sql import functions as F
>>> exprs = [F.max(x) for x in ["col1","col2"]]
>>> print(exprs)
[Column<max(col1)>, Column<max(col2)>]

10-06 15:00