问题描述
我有一个 pyspark 数据框,如下所示
I have a pyspark dataframe as shown below
+--------------------+---+
| _c0|_c1|
+--------------------+---+
|{"object":"F...| 0|
|{"object":"F...| 1|
|{"object":"F...| 2|
|{"object":"E...| 3|
|{"object":"F...| 4|
|{"object":"F...| 5|
|{"object":"F...| 6|
|{"object":"S...| 7|
|{"object":"F...| 8|
_c0
列包含一个字典形式的字符串.
The column _c0
contains a string in dictionary form.
'{ 对象": F", 时间": 2019-07-18T15:08:16.143Z", 值":[0.22124142944812775,0.2147877812385559,0.16713131964206696,0.3102800250053406,0.31872493028640747,0.3366488814353943,0.25324496626853943,0.14537988603115082,0.12684473395347595,0.13864757120609283,0.15222792327404022,0.238663449883461,0.22896413505077362,0.237777978181839]}"
如何将上述字符串转换为字典形式并获取每个键值对并将其存储到变量中?我不想把它转换成熊猫,因为它很贵.
How can I convert the above string to a dictionary form and fetch each key value pair and store it to a variables? I don't want to convert it to pandas as it is expensive.
推荐答案
您应该使用 Spark API for Scala 的等效项 Dataset.withColumn 和 from_json 标准函数.
You should use the equivalents of Spark API for Scala's Dataset.withColumn and from_json standard function.
这篇关于如何在 PySpark 中的数据帧列中转换 JSON 字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!