本文介绍了如何将printSchema的结果保存到PySpark中的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在pyspark中使用了df.printSchema()
,它为我提供了具有树结构的架构.现在,我需要将其保存在变量或文本文件中.
I have used df.printSchema()
in pyspark and it gives me the schema with tree structure. Now i need to save it in a variable or a text file.
我尝试了以下保存方法,但是它们没有用.
I have tried below methods of saving but they didn't work.
v = str(df.printSchema())
print(v)
#and
df.printSchema().saveAsTextFile(<path>)
我需要以下格式的已保存架构
I need the saved schema in below format
|-- COVERSHEET: struct (nullable = true)
| |-- ADDRESSES: struct (nullable = true)
| | |-- ADDRESS: struct (nullable = true)
| | | |-- _VALUE: string (nullable = true)
| | | |-- _city: string (nullable = true)
| | | |-- _primary: long (nullable = true)
| | | |-- _state: string (nullable = true)
| | | |-- _street: string (nullable = true)
| | | |-- _type: string (nullable = true)
| | | |-- _zip: long (nullable = true)
| |-- CONTACTS: struct (nullable = true)
| | |-- CONTACT: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- _VALUE: string (nullable = true)
| | | | |-- _name: string (nullable = true)
| | | | |-- _type: string (nullable = true)
推荐答案
您需要treeString
(出于某种原因,我在python API中找不到)
You need treeString
(which for some reason, I couldn't find in the python API)
#v will be a string
v = df._jdf.schema().treeString()
您可以将其转换为RDD并使用saveAsTextFile
You can convert it to a RDD and use saveAsTextFile
sc.parallelize([v]).saveAsTextFile(...)
或者使用Python特定的API将字符串写入文件.
Or use Python specific API to write a String to a file.
这篇关于如何将printSchema的结果保存到PySpark中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!