我将两个PySpark数据帧连接如下:
exprs = [max(x) for x in ["col1","col2"]]
df = df1.union(df2).groupBy(['campk', 'ppk']).agg(*exprs)
但我有个错误:
AssertionError: all exprs should be Column
怎么了?
最佳答案
exprs = [max(x) for x in ["col1","col2"]]
将返回最大ascii值为ie
['o', 'o']
的字符引用正确的
max
将起作用:>>> from pyspark.sql import functions as F
>>> exprs = [F.max(x) for x in ["col1","col2"]]
>>> print(exprs)
[Column<max(col1)>, Column<max(col2)>]