假设我有一个python字典键值对的列表,其中的键对应于表的列名,那么下面的列表如何将其转换为具有两个cols arg1 arg2的pyspark数据帧?

 [{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]

如何使用下面的构造来完成此操作?
df = sc.parallelize([
    ...
]).toDF

将arg1 arg2放在上面代码中的位置(…)

最佳答案

老办法:

sc.parallelize([{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]).toDF()

新方法:
from pyspark.sql import Row
from collections import OrderedDict

def convert_to_row(d: dict) -> Row:
    return Row(**OrderedDict(sorted(d.items())))

sc.parallelize([{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]) \
    .map(convert_to_row) \
    .toDF()

10-04 22:28
查看更多