本文介绍了从Spark数据框中提取Json数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
+------------------------------------------------------------------+
| message |
+------------------------------------------------------------------+
|{"name":"east-desktop","viewers":447,"emptyCount":0,"version":0.3}|
|{"name":"west-desktop","viewers":111,"emptyCount":0,"version":0.6}|
|{"name":"west-desktop","viewers":115,"emptyCount":0,"version":0.1}|
+------------------------------------------------------------------+
message:string
我有一个数据框,其中包含一列内的json数据,我想将数据提取到单独的列中或作为json文件。
I have a dataframe which contains json data within one column, I would like to extract the data in to either separate columns or as json file.
我正在使用pyspark在Databricks笔记本中工作。
I am working within a Databricks notebook using pyspark.
Dataframe
Dataframe
+---------------------------------------------+
| name | viewers| emptyCount | version |
+---------------------------------------------+
|east-desktop | 447 | 0 | 0.3 |
|west-desktop | 111 | 0 | 0.6 |
|west-desktop | 115 | 0 | 0.1 |
+---------------------------------------------+
OR Json
{
"name": "east-desktop",
"viewers": 447,
"emptyCount": 0,
"version": 0.3,
}
推荐答案
保险柜是正确的几乎是相同的问题,但是您可以使用以下示例来实现数据帧的输出:
pault was right it is pretty much the same question, but you can use the following sample to achieve your dataframe output:
df_new = spark.createDataFrame([
(str({"name":"east-desktop","viewers":447,"emptyCount":0,"version":0.3}))
],StringType())
schema = StructType(
[
StructField('name', StringType(), True),
StructField('viewers', IntegerType(), True),
StructField('emptyCount', IntegerType(), True),
StructField('version', FloatType(), True)
]
)
df_new.withColumn("data", from_json("value",schema)).select("value", col('data.*')).show(truncate=False)
输出:
+-------------------------------------------------------------------------+------------+-------+----------+-------+
|value |name |viewers|emptyCount|version|
+-------------------------------------------------------------------------+------------+-------+----------+-------+
|{'emptyCount': 0, 'version': 0.3, 'name': 'east-desktop', 'viewers': 447}|east-desktop|447 |0 |0.3 |
+-------------------------------------------------------------------------+------------+-------+----------+-------+
这篇关于从Spark数据框中提取Json数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!