如何将printSchema的结果保存到PySpark中的文件

如何将printSchema的结果保存到PySpark中的文件

本文介绍了如何将printSchema的结果保存到PySpark中的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在pyspark中使用了df.printSchema(),它为我提供了具有树结构的架构.现在,我需要将其保存在变量或文本文件中.

I have used df.printSchema() in pyspark and it gives me the schema with tree structure. Now i need to save it in a variable or a text file.

我尝试了以下保存方法,但是它们没有用.

I have tried below methods of saving but they didn't work.

v = str(df.printSchema())
print(v)
#and
df.printSchema().saveAsTextFile(<path>)

我需要以下格式的已保存架构

I need the saved schema in below format

|-- COVERSHEET: struct (nullable = true)
 |    |-- ADDRESSES: struct (nullable = true)
 |    |    |-- ADDRESS: struct (nullable = true)
 |    |    |    |-- _VALUE: string (nullable = true)
 |    |    |    |-- _city: string (nullable = true)
 |    |    |    |-- _primary: long (nullable = true)
 |    |    |    |-- _state: string (nullable = true)
 |    |    |    |-- _street: string (nullable = true)
 |    |    |    |-- _type: string (nullable = true)
 |    |    |    |-- _zip: long (nullable = true)
 |    |-- CONTACTS: struct (nullable = true)
 |    |    |-- CONTACT: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- _VALUE: string (nullable = true)
 |    |    |    |    |-- _name: string (nullable = true)
 |    |    |    |    |-- _type: string (nullable = true)

推荐答案

您需要treeString(出于某种原因,我在python API中找不到)

You need treeString (which for some reason, I couldn't find in the python API)

#v will be a string
v = df._jdf.schema().treeString()

您可以将其转换为RDD并使用saveAsTextFile

You can convert it to a RDD and use saveAsTextFile

sc.parallelize([v]).saveAsTextFile(...)

或者使用Python特定的API将字符串写入文件.

Or use Python specific API to write a String to a file.

这篇关于如何将printSchema的结果保存到PySpark中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 20:33