本文介绍了将标准的python键值字典列表转换为pyspark数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
考虑一下,我有一个python字典键值对的列表,其中键对应于表的列名,因此在下面的列表中,如何将其转换为具有两个cols arg1 arg2的pyspark数据帧?
Consider i have a list of python dictionary key value pairs , where key correspond to column name of a table, so for below list how to convert it into a pyspark dataframe with two cols arg1 arg2?
[{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]
我如何使用以下结构来做到这一点?
How can i use the following construct to do it?
df = sc.parallelize([
...
]).toDF
在上面的代码(...)中将arg1 arg2放在何处
Where to place arg1 arg2 in the above code (...)
推荐答案
旧方法:
sc.parallelize([{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]).toDF()
新方法:
from pyspark.sql import Row
from collections import OrderedDict
def convert_to_row(d: dict) -> Row:
return Row(**OrderedDict(sorted(d.items())))
sc.parallelize([{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]) \
.map(convert_to_row) \
.toDF()
这篇关于将标准的python键值字典列表转换为pyspark数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!