如何删除pyspark数据框中的列

如何删除pyspark数据框中的列

本文介绍了如何删除pyspark数据框中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

>>>一种DataFrame[id: bigint, julian_date: string, user_id: bigint]>>>乙DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]>>>a.join(b, a.id==b.id, '外部')DataFrame[id: bigint, julian_date: string, user_id: bigint, id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]

有两个 id: bigint 我想删除一个.我该怎么办?

解决方案

阅读 Spark 文档我找到了一个更简单的解决方案.

从 spark 1.4 版开始,有一个函数 drop(col) 可以在数据帧的 pyspark 中使用.

您可以通过两种方式使用它

  1. df.drop('age')
  2. df.drop(df.age)

Pyspark 文档 -放下

>>> a
DataFrame[id: bigint, julian_date: string, user_id: bigint]
>>> b
DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]
>>> a.join(b, a.id==b.id, 'outer')
DataFrame[id: bigint, julian_date: string, user_id: bigint, id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]

There are two id: bigint and I want to delete one. How can I do?

解决方案

Reading the Spark documentation I found an easier solution.

Since version 1.4 of spark there is a function drop(col) which can be used in pyspark on a dataframe.

You can use it in two ways

  1. df.drop('age')
  2. df.drop(df.age)

Pyspark Documentation - Drop

这篇关于如何删除pyspark数据框中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-18 19:37