如何删除pyspark数据框中的列 | 如何删除pyspark数据框中的列

本文介绍了如何删除pyspark数据框中的列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

>>>一种DataFrame[id: bigint, julian_date: string, user_id: bigint]>>>乙DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]>>>a.join(b, a.id==b.id, '外部')DataFrame[id: bigint, julian_date: string, user_id: bigint, id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]

有两个 id: bigint 我想删除一个.我该怎么办?

解决方案

阅读 Spark 文档我找到了一个更简单的解决方案.

从 spark 1.4 版开始，有一个函数 drop(col) 可以在数据帧的 pyspark 中使用.

您可以通过两种方式使用它

df.drop('age')
df.drop(df.age)

Pyspark 文档 -放下

>>> a
DataFrame[id: bigint, julian_date: string, user_id: bigint]
>>> b
DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]
>>> a.join(b, a.id==b.id, 'outer')
DataFrame[id: bigint, julian_date: string, user_id: bigint, id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]

There are two id: bigint and I want to delete one. How can I do?

解决方案

Reading the Spark documentation I found an easier solution.

Since version 1.4 of spark there is a function drop(col) which can be used in pyspark on a dataframe.

You can use it in two ways

df.drop('age')
df.drop(df.age)

Pyspark Documentation - Drop

这篇关于如何删除pyspark数据框中的列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！