使用pandas,我想得到列中特定值的计数。我知道使用df.some column.ravel()可以得到所有的唯一值和它们的计数。但是如何获得某些特定值的计数。

In[5]:df
Out[5]:
        col
         1
         1
         1
         1
         2
         2
         2
         1

渴望的:
  To get count of 1.

  In[6]:df.somecalulation(1)
  Out[6]: 5

  To get count of 2.

  In[6]:df.somecalulation(2)
  Out[6]: 3

最佳答案

您可以尝试:

df = df['col'].value_counts().reset_index()
df.columns = ['col', 'count']
print df
   col  count
0    1      5
1    2      3

编辑:
print (df['col'] == 1).sum()
5

或:
def somecalulation(x):
    return (df['col'] == x).sum()

print somecalulation(1)
5
print somecalulation(2)
3

或:
ser = df['col'].value_counts()

def somecalulation(s, x):
    return s[x]

print somecalulation(ser, 1)
5
print somecalulation(ser, 2)
3

编辑2:
如果您需要非常快的东西,请使用:
import pandas as pd
import numpy as np

a = pd.Series([1, 1, 1, 1, 2, 2])

#for testing len(a) = 6000
a = pd.concat([a]*1000).reset_index(drop=True)

print np.in1d(a,1).sum()
4000
print (a == 1).sum()
4000
print np.sum(a==1)
4000

时间安排:
value_counts
In [131]: %timeit np.in1d(a,1).sum()
The slowest run took 9.17 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 29.9 µs per loop

In [132]: %timeit np.sum(a == 1)
10000 loops, best of 3: 196 µs per loop

In [133]: %timeit (a == 1).sum()
1000 loops, best of 3: 180 µs per loop

numpy.in1d
In [135]: %timeit np.in1d(a,1).sum()
The slowest run took 7.29 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 48.5 µs per loop

In [136]: %timeit np.sum(a == 1)
The slowest run took 5.23 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 273 µs per loop

In [137]: %timeit (a == 1).sum()
1000 loops, best of 3: 271 µs per loop

关于python - Pandas,获取Dataframe列中单个值的计数,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/36067894/

10-12 20:15