本文介绍了如何过滤pyspark列表中值的列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个数据框原始数据,我必须在 X 列上应用过滤条件,其值为 CB、CI 和 CR.所以我使用了下面的代码:
I have a dataframe rawdata on which i have to apply filter condition on column X with values CB,CI and CR. So I used the below code:
df = dfRawData.filter(col("X").between("CB","CI","CR"))
但我收到以下错误:
between() 正好接受 3 个参数(给定 4 个)
请告诉我如何解决此问题.
Please let me know how I can resolve this issue.
推荐答案
between
函数用于检查值是否在两个值之间,输入的是下界和上界.它不能用于检查列值是否在列表中.为此,请使用 isin
:
The function between
is used to check if the value is between two values, the input is a lower bound and an upper bound. It can not be used to check if a column value is in a list. To do that, use isin
:
import pyspark.sql.functions as f
df = dfRawData.where(f.col("X").isin(["CB", "CI", "CR"]))
这篇关于如何过滤pyspark列表中值的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!