本文介绍了正交匹配追踪回归——我用错了吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将此方法作为正则化回归,作为套索和弹性网的替代方案.我有 40k 个数据点和 40 个特征.Lasso选择了5个特征,正交匹配追踪只选择了1个.

I am trying out this method as a regularized regression, as an alternative to lasso and elastic net. I have 40k data points and 40 features. Lasso selects 5 features, and orthogonal matching pursuit selects only 1.

可能是什么原因造成的?我是否以错误的方式使用 omp?也许它不打算用作回归.如果您能解决我可能做错的任何其他事情,请告诉我.

What could be causing this? Am I using omp the wrong way? Perhaps it is not meant to be used as a regression. Please let me know if you can thing of anything else I may be doing wrong.

推荐答案

Orthogonal Matching Pursuit 似乎有点坏,或者至少对输入数据非常敏感,就像在 scikit-learn 中实现的那样.

Orthogonal Matching Pursuit seems a bit broken, or at least very sensitive to input data, as implemented in scikit-learn.

示例:

import sklearn.linear_model
import sklearn.datasets
import numpy

X, y, w = sklearn.datasets.make_regression(n_samples=40000, n_features=40, n_informative=10, coef=True, random_state=0)

clf1 = sklearn.linear_model.LassoLarsCV(fit_intercept=True, normalize=False, max_n_alphas=1e6)
clf1.fit(X, y)

clf2 = sklearn.linear_model.OrthogonalMatchingPursuitCV(fit_intercept=True, normalize=False)
clf2.fit(X, y)

# this is 1e-10, LassoLars is basically exact on this data
print numpy.linalg.norm(y - clf1.predict(X))

# this is 7e+8, OMP is broken
print numpy.linalg.norm(y - clf2.predict(X))

有趣的实验:

  • sklearn.datasets 中有一堆罐头数据集.OMP 是否在所有方面都失败了?显然,它在糖尿病数据集上运行良好......

  • There are a bunch of canned datasets in sklearn.datasets. Does OMP fail on all of them? Apparently, it works okay on the diabetes dataset...

make_regression 是否有任何参数组合可以生成 OMP 适用的数据?仍在寻找那个... 100 x 100 和 100 x 10 以同样的方式失败.

Is there any combination of parameters to make_regression that would generate data that OMP works for? Still looking for that one... 100 x 100 and 100 x 10 fail in the same way.

这篇关于正交匹配追踪回归——我用错了吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 22:18