如何更改Spark数据框中的列位置? | 如何更改Spark数据框中的列位置

本文介绍了如何更改Spark数据框中的列位置?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想知道是否可以更改数据框中列的位置，实际上可以更改架构吗?

I was wondering if it is possible to change the position of a column in a dataframe, actually to change the schema?

如果我有一个像[field1, field2, field3]这样的数据框，而我想得到[field1, field3, field2].

Precisely if I have got a dataframe like [field1, field2, field3], and I would like to get [field1, field3, field2].

我不能输入任何代码.让我们想象一下，我们正在使用一个具有一百列的数据框，在进行一些连接和转换之后，其中一些列在目标表的模式方面被放错了位置.

I can't put any piece of code.Let us imagine we're working with a dataframe with one hundred columns, after some joins and transformations, some of these columns are misplaced regarding the schema of the destination table.

如何移动一列或几列，即:如何更改架构?

How to move one or several columns, i.e: how to change the schema?

推荐答案

您可以获取列名，并根据需要对其进行重新排序，然后在原始DataFrame上使用select来按此新顺序获取新的列:

You can get the column names, reorder them however you want, and then use select on the original DataFrame to get a new one with this new order:

val columns: Array[String] = dataFrame.columns
val reorderedColumnNames: Array[String] = ??? // do the reordering you want
val result: DataFrame = dataFrame.select(reorderedColumnNames.head, reorderedColumnNames.tail: _*)

这篇关于如何更改Spark数据框中的列位置?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！