问题描述
我正在尝试使用Pandas查找每列中不同值的计数.这就是我所做的.
I am trying to find the count of distinct values in each column using Pandas. This is what I did.
import pandas as pd
import numpy as np
# Generate data.
NROW = 10000
NCOL = 100
df = pd.DataFrame(np.random.randint(1, 100000, (NROW, NCOL)),
columns=['col' + x for x in np.arange(NCOL).astype(str)])
我需要计算每一列的不同元素的数量,如下所示:
I need to count the number of distinct elements for each column, like this:
col0 9538
col1 9505
col2 9524
最有效的方法是什么,因为该方法将应用于大小大于1.5GB的文件?
What would be the most efficient way to do this, as this method will be applied to files which have size greater than 1.5GB?
根据答案,df.apply(lambda x: len(x.unique()))
是最快的(笔记本).
Based upon the answers, df.apply(lambda x: len(x.unique()))
is the fastest (notebook).
%timeit df.apply(lambda x: len(x.unique()))10 loops, best of 3: 49.5 ms per loop%timeit df.nunique()10 loops, best of 3: 59.7 ms per loop%timeit df.apply(pd.Series.nunique)10 loops, best of 3: 60.3 ms per loop%timeit df.T.apply(lambda x: x.nunique(), axis=1)10 loops, best of 3: 60.5 ms per loop
%timeit df.apply(lambda x: len(x.unique()))10 loops, best of 3: 49.5 ms per loop%timeit df.nunique()10 loops, best of 3: 59.7 ms per loop%timeit df.apply(pd.Series.nunique)10 loops, best of 3: 60.3 ms per loop%timeit df.T.apply(lambda x: x.nunique(), axis=1)10 loops, best of 3: 60.5 ms per loop
推荐答案
从 pandas 0.20 开始,我们可以直接在DataFrame
上使用nunique
,即:
As of pandas 0.20 we can use nunique
directly on DataFrame
s, i.e.:
df.nunique()
a 4
b 5
c 1
dtype: int64
其他旧版选项:
Other legacy options:
您可以对df进行转置,然后使用 apply
调用 nunique
按行:
You could do a transpose of the df and then using apply
call nunique
row-wise:
In [205]:
df = pd.DataFrame({'a':[0,1,1,2,3],'b':[1,2,3,4,5],'c':[1,1,1,1,1]})
df
Out[205]:
a b c
0 0 1 1
1 1 2 1
2 1 3 1
3 2 4 1
4 3 5 1
In [206]:
df.T.apply(lambda x: x.nunique(), axis=1)
Out[206]:
a 4
b 5
c 1
dtype: int64
编辑
@ajcr指出,转置是不必要的:
As pointed out by @ajcr the transpose is unnecessary:
In [208]:
df.apply(pd.Series.nunique)
Out[208]:
a 4
b 5
c 1
dtype: int64
这篇关于在每一列的DataFrame中查找不同元素的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!