我有 X_train 和 y_train 作为 2 numpy.ndarrays 大小分别为 (32561, 108) 和 (32561,)。
每次我调用适合我的 GaussianProcessClassifier 时,我都会收到一个内存错误。
>>> import pandas as pd
>>> import numpy as np
>>> from sklearn.gaussian_process import GaussianProcessClassifier
>>> from sklearn.gaussian_process.kernels import RBF
>>> X_train.shape
(32561, 108)
>>> y_train.shape
(32561,)
>>> gp_opt = GaussianProcessClassifier(kernel=1.0 * RBF(length_scale=1.0))
>>> gp_opt.fit(X_train,y_train)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 613, in fit
self.base_estimator_.fit(X, y)
File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 209, in fit
self.kernel_.bounds)]
File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 427, in _constrained_optimization
fmin_l_bfgs_b(obj_func, initial_theta, bounds=bounds)
File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/lbfgsb.py", line 199, in fmin_l_bfgs_b
**opts)
File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/lbfgsb.py", line 335, in _minimize_lbfgsb
f, g = func_and_grad(x)
File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/lbfgsb.py", line 285, in func_and_grad
f = fun(x, *args)
File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 292, in function_wrapper
return function(*(wrapper_args + args))
File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 63, in __call__
fg = self.fun(x, *args)
File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 201, in obj_func
theta, eval_gradient=True)
File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 338, in log_marginal_likelihood
K, K_gradient = kernel(self.X_train_, eval_gradient=True)
File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/kernels.py", line 753, in __call__
K1, K1_gradient = self.k1(X, Y, eval_gradient=True)
File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/kernels.py", line 1002, in __call__
K = self.constant_value * np.ones((X.shape[0], Y.shape[0]))
File "/home/retsim/.local/lib/python2.7/site-packages/numpy/core/numeric.py", line 188, in ones
a = empty(shape, dtype, order)
MemoryError
>>>
为什么我会收到此错误,我该如何解决?
最佳答案
在 400 of gpc.py
线上,您正在使用的分类器的实现,创建了一个大小为 (N, N)
的矩阵,其中 N
是观察的数量。所以代码试图创建一个形状为 (32561, 32561)
的矩阵。这显然会导致一些问题,因为该矩阵有超过 10 亿个元素。
至于为什么这样做,我真的不知道 scikit-learn 的实现,但总的来说,高斯过程需要估计整个输入空间的协方差矩阵,这就是为什么如果你有高维数据,它们不是那么好. (文档说“高维”指的不仅仅是几十个。)
我对如何修复它的唯一建议是分批工作。 Scikit-learn 可能有一些实用程序可以为您生成批处理,或者您可以手动生成。
关于python - Scikit 在使用 fit() 函数时学习 GaussianProcessClassifier 内存错误,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/49524761/