如何在pyspark中获取数据框列的名称?

本文介绍了如何在pyspark中获取数据框列的名称?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

在 Pandas 中，这可以通过 column.name 来完成.

但是当它的 spark dataframe 列时如何做同样的事情?

例如调用程序有一个 spark 数据帧:spark_df

>>>spark_df.columns['admit', 'gre', 'gpa', 'rank']

这个程序调用我的函数:my_function(spark_df['rank'])在 my_function 中，我需要列的名称，即 'rank'

如果是pandas dataframe，我们可以在my_function里面使用

>>>pandas_df['rank'].name'秩'

解决方案

您可以通过执行以下操作从架构中获取名称

spark_df.schema.names

打印模式也有助于将其可视化

spark_df.printSchema()

In pandas, this can be done by column.name.

But how to do the same when its column of spark dataframe?

e.g. The calling program has a spark dataframe: spark_df

>>> spark_df.columns
['admit', 'gre', 'gpa', 'rank']

This program calls my function: my_function(spark_df['rank'])In my_function, I need the name of the column i.e. 'rank'

If it was pandas dataframe, we can use inside my_function

>>> pandas_df['rank'].name
'rank'

解决方案

You can get the names from the schema by doing

spark_df.schema.names

Printing the schema can be useful to visualize it as well

spark_df.printSchema()

这篇关于如何在pyspark中获取数据框列的名称?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！