问题描述
目前我正在使用 numpy 来完成这项工作.但是,由于我正在处理具有数千行/列的矩阵,后来这个数字将上升到数万,我想知道是否存在可以更快地执行此类计算的包?
**如果您的矩阵是稀疏矩阵,则使用 scipy.sparse 中的构造函数实例化您的矩阵,然后使用 spicy.sparse.linalg.从性能的角度来看,这有两个优点:
你的矩阵是由spicy.sparse构造函数构建的,它会随着它的稀疏程度成比例地变小.
特征值/特征向量方法(eigs、eigsh)接受一个可选参数,k您想要返回的特征向量/特征值对的数量.几乎总是占> 99%的方差所需的数量远远少于列数,您可以事后验证;换句话说,您可以告诉方法不要计算和返回所有特征向量/特征值对——除了解释方差所需的(通常)小子集之外,您不太可能需要其余的.
改用SciPy、 scipy.linalg中的线性代数库NumPy 同名库.这两个库有相同的名称并使用相同的方法名称.但是性能上还是有区别的.这种差异是由于 numpy.linalg 是一个less 对类似 LAPACK 例程的忠实包装为了便携性和便利性而牺牲一些性能(即符合 NumPy 设计目标,即整个 NumPy 库应该在没有 Fortran 编译器的情况下构建).linalg 在 SciPy 上另一方面是对 LAPACK 的一个更完整的包装器,它使用 f2py.
选择适合您用例的功能;换句话说,不要使用超出您需要的功能.在 scipy.linalg有几个函数可以计算特征值;这差异不大,虽然通过仔细选择功能要计算特征值,您应该会看到性能提升.为了实例:
- scipy.linalg.eig 返回两个特征值和特征向量
- scipy.linalg.eigvals,只返回特征值.因此,如果您只需要矩阵的特征值,那么不要使用 linalg.eig,而是使用 linalg.eigvals.
- 如果你有一个实值方对称矩阵(等于它的转置),那么使用 scipy.linalg.eigsh
优化您的 Scipy 构建 准备您的 Scipy 构建环境主要是在 SciPy 的 setup.py 脚本中完成的.也许性能方面最重要的选项是识别任何优化的LAPACK 库,例如 ATLAS 或 Accelerate/vecLib 框架 (OS X只有?),以便 SciPy 可以检测到它们并针对它们进行构建.根据您目前拥有的装备,优化您的 SciPy构建然后重新安装可以为您提供可观的性能增加.SciPy 核心团队的其他说明位于此处.
这些函数是否适用于大型矩阵?
我应该这么认为.这些是工业强度矩阵分解方法,它们只是对类似 Fortran LAPACK 例程的薄包装.
我已经使用了linalg库中的大部分方法来分解列数通常在5到50之间,行数通常超过500,000的矩阵.SVD 和 特征值 方法在处理这种大小的矩阵时似乎都没有任何问题.
使用 SciPy 库 linalg,您可以使用该库中的多种方法中的任何一种,通过一次调用来计算特征向量和特征值,eig、eigvalsh 和 eigh.
>>>将 numpy 导入为 NP>>>从 scipy 导入 linalg 作为 LA>>>A = NP.random.randint(0, 10, 25).reshape(5, 5)>>>一个数组([[9, 5, 4, 3, 7],[3, 3, 2, 9, 7],[6, 5, 3, 4, 0],[7, 3, 5, 5, 5],[2, 5, 4, 7, 8]])>>>e_vals, e_vecs = LA.eig(A)Currently im using numpy which does the job. But, as i'm dealing with matrices with several thousands of rows/columns and later this figure will go up to tens of thousands, i was wondering if there was a package in existence that can perform this kind of calculations faster ?
**if your matrix is sparse, then instantiate your matrix using a constructor from scipy.sparse then use the analogous eigenvector/eigenvalue methods in spicy.sparse.linalg. From a performance point of view, this has two advantages:
your matrix, built from the spicy.sparse constructor, will be smaller in proportion to how sparse it is.
the eigenvalue/eigenvector methods for sparse matrices (eigs, eigsh) accept an optional argument, k which is the number of eigenvector/eigenvalue pairs you want returned. Nearly always the number required to account for the >99% of the variance is far less then the number of columns, which you can verify ex post; in other words, you can tell method not to calculate and return all of the eigenvectors/eigenvalue pairs--beyond the (usually) small subset required to account for the variance, it's unlikely you need the rest.
use the linear algebra library in SciPy, scipy.linalg, insteadof the NumPy library of the same name. These two libraries havethe same name and use the same method names. Yet there's a difference in performance.This difference is caused by the fact that numpy.linalg is aless faithful wrapper on the analogous LAPACK routines whichsacrifice some performance for portability and convenience (i.e.,to comply with the NumPy design goal that the entire NumPy libraryshould be built without a Fortran compiler). linalg in SciPy onthe other hand is a much more complete wrapper on LAPACK and whichuses f2py.
select the function appropriate for your use case; in other words, don't use a function does more than you need. In scipy.linalgthere are several functions to calculate eigenvalues; thedifferences are not large, though by careful choice of the functionto calculate eigenvalues, you should see a performance boost. Forinstance:
- scipy.linalg.eig returns both the eigenvalues andeigenvectors
- scipy.linalg.eigvals, returns only the eigenvalues. So if you only need the eigenvalues of a matrix then do not use linalg.eig, use linalg.eigvals instead.
- if you have a real-valued square symmetric matrices (equal to its transpose) then use scipy.linalg.eigsh
optimize your Scipy build Preparing your SciPy build environementis done largely in SciPy's setup.py script. Perhaps themost significant option performance-wise is identifying any optimizedLAPACK libraries such as ATLAS or Accelerate/vecLib framework (OS Xonly?) so that SciPy can detect them and build against them.Depending on the rig you have at the moment, optimizing your SciPybuild then re-installing can give you a substantial performanceincrease. Additional notes from the SciPy core team are here.
Will these functions work for large matrices?
I should think so. These are industrial strength matrix decomposition methods, and which are just thin wrappers over the analogous Fortran LAPACK routines.
I have used most of the methods in the linalg library to decompose matrices in which the number of columns is usually between about 5 and 50, and in which the number of rows usually exceeds 500,000. Neither the SVD nor the eigenvalue methods seem to have any problem handling matrices of this size.
Using the SciPy library linalg you can calculate eigenvectors and eigenvalues, with a single call, using any of several methods from this library, eig, eigvalsh, and eigh.
>>> import numpy as NP
>>> from scipy import linalg as LA
>>> A = NP.random.randint(0, 10, 25).reshape(5, 5)
>>> A
array([[9, 5, 4, 3, 7],
[3, 3, 2, 9, 7],
[6, 5, 3, 4, 0],
[7, 3, 5, 5, 5],
[2, 5, 4, 7, 8]])
>>> e_vals, e_vecs = LA.eig(A)
这篇关于在python中找到特征值/向量的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!