问题描述
我正在用L2正则化在MATLAB中对文本数据执行逻辑回归.我的程序适用于小型数据集.对于较大的集合,它会无限运行.
I am performing logistic regression in MATLAB with L2 regularization on text data. My program works well for small datasets. For larger sets, it keeps running infinitely.
我已经看到了潜在的重复问题(无限期)).在该问题中,初始theta的成本为NaN,并且控制台中显示错误.对于我的实现,我得到了真正有价值的成本,即使将冗长的参数传递给fminunc()也没有错误.因此,我相信这个问题可能不会重复.
I have seen the potentially duplicate question (matlab fminunc not quitting (running indefinitely)). In that question, the cost for initial theta was NaN and there was an error printed in the console. For my implementation, I am getting a real valued cost and there is no error even with verbose parameters being passed to fminunc(). Hence I believe this question might not be a duplicate.
在将其缩放到更大的集时,我需要帮助.我当前正在处理的训练数据的大小约为10k * 12k(10k文本文件累计包含12k单词).因此,我有m = 10k的训练示例和n = 12k的特征.
I need help in scaling it to larger sets. The size of the training data I am currently working on is roughly 10k*12k (10k text files cumulatively containing 12k words). Thus, I have m=10k training examples and n=12k features.
我的成本函数定义如下:
My cost function is defined as follows:
function [J gradient] = costFunction(X, y, lambda, theta)
[m n] = size(X);
g = inline('1.0 ./ (1.0 + exp(-z))');
h = g(X*theta);
J =(1/m)*sum(-y.*log(h) - (1-y).*log(1-h))+ (lambda/(2*m))*norm(theta(2:end))^2;
gradient(1) = (1/m)*sum((h-y) .* X(:,1));
for i = 2:n
gradient(i) = (1/m)*sum((h-y) .* X(:,i)) - (lambda/m)*theta(i);
end
end
我正在使用MATLAB的fminunc()函数执行优化.我传递给fminunc()的参数是:
I am performing optimization using MATLAB's fminunc() function. The parameters I pass to fminunc() are:
options = optimset('LargeScale', 'on', 'GradObj', 'on', 'MaxIter', MAX_ITR);
theta0 = zeros(n, 1);
[optTheta, functionVal, exitFlag] = fminunc(@(t) costFunction(X, y, lambda, t), theta0, options);
我正在具有以下规格的计算机上运行此代码:
I am running this code on a machine with these specifications:
Macbook Pro i7 2.8GHz / 8GB RAM / MATLAB R2011b
cost函数似乎正常运行.对于初始θ,我得到了可接受的J和梯度值.
The cost function seems to behave correctly. For initial theta, I get acceptable values of J and gradient.
K>> theta0 = zeros(n, 1);
K>> [j g] = costFunction(X, y, lambda, theta0);
K>> j
j =
0.6931
K>> max(g)
ans =
0.4082
K>> min(g)
ans =
-2.7021e-05
该程序耗时极长.我开始分析以保持fminunc()的MAX_ITR = 1.仅经过一次迭代,即使经过几个小时,该程序也无法完成执行.我的问题是:
The program takes incredibly long to run. I started profiling keeping MAX_ITR = 1 for fminunc(). With a single iteration, the program did not complete execution even after a couple of hours had elapsed. My questions are:
-
我在数学上做错了吗?
Am I doing something wrong mathematically?
是否应该使用其他任何优化程序代替fminunc()?使用LargeScale = on时,fminunc()使用信任区域算法.
Should I use any other optimizer instead of fminunc()? With LargeScale=on, fminunc() uses trust-region algorithms.
此问题是集群规模的,不应在单台计算机上运行吗?
Is this problem cluster-scale and should not be run on a single machine?
任何其他一般性提示将不胜感激.谢谢!
Any other general tips will be appreciated. Thanks!
这有助于解决问题:我可以通过将fminunc()中的LargeScale标志设置为"off"来使其工作.据我所知,LargeScale ='on'使用信任区域算法,而将其保持为'off'使用准牛顿方法.使用准牛顿法并通过渐变可以解决这个特定问题,并且效果非常好.
This helped solve the problem: I was able to get this working by setting the LargeScale flag to 'off' in fminunc(). From what I gather, LargeScale = 'on' uses trust region algorithms, while keeping it 'off' uses quasi-newton methods. Using quasi-newton methods and passing the gradient worked a lot faster for this particular problem and gave very nice results.
推荐答案
我能够通过将fminunc()中的LargeScale标志设置为"off"来使其工作.据我所知,LargeScale ='on'使用信任区域算法,而将其保持为'off'使用准牛顿方法.使用准牛顿法并通过渐变可以解决这个特定问题,并且效果非常好.
I was able to get this working by setting the LargeScale flag to 'off' in fminunc(). From what I gather, LargeScale = 'on' uses trust region algorithms, while keeping it 'off' uses quasi-newton methods. Using quasi-newton methods and passing the gradient worked a lot faster for this particular problem and gave very nice results.
这篇关于对于大型数据集,MATLAB fminunc()未完成.适用于较小的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!