本文介绍了为什么np.where比pd.apply快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
示例代码在这里
import pandas as pd
import numpy as np
df = pd.DataFrame({'Customer' : ['Bob', 'Ken', 'Steve', 'Joe'],
'Spending' : [130,22,313,46]})
#[400000 rows x 4 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
In [129]: %timeit df['Grade']= np.where(df['Spending'] > 100 ,'A','B')
10 loops, best of 3: 21.6 ms per loop
In [130]: %timeit df['grade'] = df.apply(lambda row: 'A' if row['Spending'] > 100 else 'B', axis = 1)
1 loop, best of 3: 7.08 s per loop
问题来自此处: https://stackoverflow.com/a/41166160/3027854
我认为np.where
更快,因为使用numpy array
向量化方式并且在此数组上构建了pandas. df.apply
很慢,因为它使用loops
.
vectorize
操作最快,然后是cython routines
然后是apply
.
请参阅此答案,其中有关于熊猫开发者的更好的解释-Jeff
.
Sample code is here
import pandas as pd
import numpy as np
df = pd.DataFrame({'Customer' : ['Bob', 'Ken', 'Steve', 'Joe'],
'Spending' : [130,22,313,46]})
#[400000 rows x 4 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
In [129]: %timeit df['Grade']= np.where(df['Spending'] > 100 ,'A','B')
10 loops, best of 3: 21.6 ms per loop
In [130]: %timeit df['grade'] = df.apply(lambda row: 'A' if row['Spending'] > 100 else 'B', axis = 1)
1 loop, best of 3: 7.08 s per loop
Question taken from here: https://stackoverflow.com/a/41166160/3027854
解决方案
I think np.where
is faster because use numpy array
vectorized way and pandas is built on this arrays.
df.apply
is slow, because it use loops
.
vectorize
operations are the fastest, then cython routines
and then apply
.
See this answer with better explanation of developer of pandas - Jeff
.
这篇关于为什么np.where比pd.apply快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!