问题描述
现在,我必须使用 df.count >0
检查 DataFrame
是否为空.但这有点低效.有没有更好的方法来做到这一点?
Right now, I have to use df.count > 0
to check if the DataFrame
is empty or not. But it is kind of inefficient. Is there any better way to do that?
谢谢.
PS:我想检查它是否为空,以便我只保存 DataFrame
如果它不为空
PS: I want to check if it's empty so that I only save the DataFrame
if it's not empty
推荐答案
对于 Spark 2.1.0,我的建议是使用 head(n: Int)
或 take(n:Int)
和 isEmpty
,以您最清楚的意图为准.
For Spark 2.1.0, my suggestion would be to use head(n: Int)
or take(n: Int)
with isEmpty
, whichever one has the clearest intent to you.
df.head(1).isEmpty
df.take(1).isEmpty
与 Python 等效:
with Python equivalent:
len(df.head(1)) == 0 # or bool(df.head(1))
len(df.take(1)) == 0 # or bool(df.take(1))
如果 DataFrame 为空,使用 df.first()
和 df.head()
都将返回 java.util.NoSuchElementException
.first()
直接调用head()
,即调用head(1).head
.
Using df.first()
and df.head()
will both return the java.util.NoSuchElementException
if the DataFrame is empty. first()
calls head()
directly, which calls head(1).head
.
def first(): T = head()
def head(): T = head(1).head
head(1)
返回一个数组,因此当 DataFrame 为空时,对该数组使用 head
会导致 java.util.NoSuchElementException
.
head(1)
returns an Array, so taking head
on that Array causes the java.util.NoSuchElementException
when the DataFrame is empty.
def head(n: Int): Array[T] = withAction("head", limit(n).queryExecution)(collectFromPlan)
所以不要调用head()
,直接使用head(1)
来获取数组,然后就可以使用isEmpty
.
So instead of calling head()
, use head(1)
directly to get the array and then you can use isEmpty
.
take(n)
也等价于 head(n)
...
def take(n: Int): Array[T] = head(n)
而limit(1).collect()
等价于head(1)
(注意limit(n).queryExecution
在head(n: Int)
方法),所以以下都是等价的,至少据我所知,您不必捕获 java.util.NoSuchElementException DataFrame 为空时的异常.
And
limit(1).collect()
is equivalent to head(1)
(notice limit(n).queryExecution
in the head(n: Int)
method), so the following are all equivalent, at least from what I can tell, and you won't have to catch a java.util.NoSuchElementException
exception when the DataFrame is empty.
df.head(1).isEmpty
df.take(1).isEmpty
df.limit(1).collect().isEmpty
我知道这是一个较旧的问题,所以希望它可以帮助使用较新版本 Spark 的人.
I know this is an older question so hopefully it will help someone using a newer version of Spark.
这篇关于如何检查spark数据框是否为空?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!