本文介绍了如何删除pyspark数据框中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
>>>一种DataFrame[id: bigint, julian_date: string, user_id: bigint]>>>乙DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]>>>a.join(b, a.id==b.id, '外部')DataFrame[id: bigint, julian_date: string, user_id: bigint, id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]有两个 id: bigint
我想删除一个.我该怎么办?
解决方案
阅读 Spark 文档我找到了一个更简单的解决方案.
从 spark 1.4 版开始,有一个函数 drop(col)
可以在数据帧的 pyspark 中使用.
您可以通过两种方式使用它
df.drop('age')
df.drop(df.age)
>>> a
DataFrame[id: bigint, julian_date: string, user_id: bigint]
>>> b
DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]
>>> a.join(b, a.id==b.id, 'outer')
DataFrame[id: bigint, julian_date: string, user_id: bigint, id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]
There are two id: bigint
and I want to delete one. How can I do?
解决方案
Reading the Spark documentation I found an easier solution.
Since version 1.4 of spark there is a function drop(col)
which can be used in pyspark on a dataframe.
You can use it in two ways
df.drop('age')
df.drop(df.age)
这篇关于如何删除pyspark数据框中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!