在简单的数学运算上比

在简单的数学运算上比

本文介绍了为什么 Pandas 在简单的数学运算上比 numpy 快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我观察到 Pandas 的乘法速度更快.我在下面的示例中向您展示了这一点.这么简单的操作怎么可能呢?这怎么可能呢?pandas 数据帧中的底层数据容器是 numpy 数组.

Recently, I observed that pandas is faster on multiplications. I show you this in an example below. How is this possible on such simple operations? How is this possible at all? The underlying data container within pandas dataframes are numpy arrays.

我使用形状为 (10k, 10k) 的数组/数据帧.

I use arrays/dataframes with shapes (10k, 10k).

import numpy as np
import pandas as pd

a = np.random.randn(10000, 10000)
d = pd.DataFrame(a.copy())
a.shape
(10000, 10000)
d.shape
(10000, 10000)
%%timeit
d * d
53.2 ms ± 333 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
a * a
318 ms ± 12.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

观察

pandas 比 numpy 快五到六倍来评估这个简单的乘法.怎么会这样?

Observations

pandas is about five to six times faster than numpy to evaluate this simple multiplication. How can this be?

推荐答案

Pandas 在幕后使用 numexpr

Pandas 使用 numexpr 如果安装了的话.这在我的情况下是正确的.如果我明确使用 numexpr,我会得到以下结果.

Pandas uses numexpr behind the scenes

Pandas uses numexpr under the hood if it is installed. This is true in my case. If I use numexpr explicitly I get the following.

使用 numexpr.evaluate 可以计算 numpy.ndarrays 上的有效"数值表达式.

With numexpr.evaluate a 'valid' numerical expression on numpy.ndarrays can be evaluated.

import numexpr
%%timeit
numexpr.evaluate('a * a')
52.7 ms ± 398 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

观察

评估数组与自身乘积的时间现在正负与熊猫所需的时间相同.

Observations

The wall time for evaluating the product of an array with itself is now plus minus the same as the one needed by pandas.

在某些情况下,pandas 比单独的 numpy 快.另一方面,通过将 numexpr 与 numpy 一起使用可以获得相同的加速.但你需要做你自己的".此外,这不是熊猫的正常用例.通常有一个数据框,在至少一个轴上附加一个索引或多索引(层次索引).例如,需要研究将数据帧与不相等的 MultiIndex(广播)相乘.

There can be cases where pandas is faster then numpy alone. On the other hand, by using numexpr together with numpy one can get the same speedup. But you need to do it 'your own'. Additionally, this here is not a normal use case for pandas. Usually one has a dataframe with an Index or a MultiIndex (Hierarchical Index) attached on at least one axis. Multiplying dataframes with not equal MultiIndex (broadcasting) for example, needs to be investigated.

这篇关于为什么 Pandas 在简单的数学运算上比 numpy 快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 05:44