


import pandas as pd
import numpy as np

df = pd.DataFrame({'Customer' : ['Bob', 'Ken', 'Steve', 'Joe'],
                   'Spending' : [130,22,313,46]})

#[400000 rows x 4 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [129]: %timeit df['Grade']= np.where(df['Spending'] > 100 ,'A','B')
10 loops, best of 3: 21.6 ms per loop

In [130]: %timeit df['grade'] = df.apply(lambda row: 'A' if row['Spending'] > 100 else 'B', axis = 1)
1 loop, best of 3: 7.08 s per loop

问题来自此处: https://stackoverflow.com/a/41166160/3027854

我认为np.where更快,因为使用numpy array向量化方式并且在此数组上构建了pandas.


vectorize操作最快,然后是cython routines然后是apply.


Sample code is here

import pandas as pd
import numpy as np

df = pd.DataFrame({'Customer' : ['Bob', 'Ken', 'Steve', 'Joe'],
                   'Spending' : [130,22,313,46]})

#[400000 rows x 4 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [129]: %timeit df['Grade']= np.where(df['Spending'] > 100 ,'A','B')
10 loops, best of 3: 21.6 ms per loop

In [130]: %timeit df['grade'] = df.apply(lambda row: 'A' if row['Spending'] > 100 else 'B', axis = 1)
1 loop, best of 3: 7.08 s per loop

Question taken from here: https://stackoverflow.com/a/41166160/3027854


I think np.where is faster because use numpy array vectorized way and pandas is built on this arrays.

df.apply is slow, because it use loops.

vectorize operations are the fastest, then cython routines and then apply.

See this answer with better explanation of developer of pandas - Jeff.


08-11 14:08