问题描述
我试图获取一个数据帧中的所有行,其中两个标志设置为"1",然后所有那些仅将两个标志之一设置为"1"而另一个不相等到"1"
I am trying to obtain all rows in a dataframe where two flags are set to '1' and subsequently all those that where only one of two is set to '1' and the other NOT EQUAL to '1'
具有以下架构(三列)
df = sqlContext.createDataFrame([('a',1,'null'),('b',1,1),('c',1,'null'),('d','null',1),('e',1,1)], #,('f',1,'NaN'),('g','bla',1)],
schema=('id', 'foo', 'bar')
)
我获得以下数据框:
+---+----+----+
| id| foo| bar|
+---+----+----+
| a| 1|null|
| b| 1| 1|
| c| 1|null|
| d|null| 1|
| e| 1| 1|
+---+----+----+
当我应用所需的过滤器时,第一个过滤器(foo = 1 AND bar = 1)有效,而其他过滤器(foo = 1 AND NOT bar = 1)无效
When I apply the desired filters, the first filter (foo=1 AND bar=1) works, but not the other (foo=1 AND NOT bar=1)
foobar_df = df.filter( (df.foo==1) & (df.bar==1) )
产量:
+---+---+---+
| id|foo|bar|
+---+---+---+
| b| 1| 1|
| e| 1| 1|
+---+---+---+
以下是非行为过滤器:
foo_df = df.filter( (df.foo==1) & (df.bar!=1) )
foo_df.show()
+---+---+---+
| id|foo|bar|
+---+---+---+
+---+---+---+
为什么不过滤?如何获得只有foo等于'1'的列?
Why is it not filtering? How can I get the columns where only foo is equal to '1'?
推荐答案
要过滤空值,请尝试:
foo_df = df.filter( (df.foo==1) & (df.bar.isNull()) )
https://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html#pyspark.sql.Column.isNull
这篇关于PySpark中的比较运算符(不等于/!=)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!