问题非常简单,只有5个样本。
但是,梯度下降收敛非常慢,就像数百万次迭代一样。
为什么,我的算法有错误?
附言下面的Julia代码:
X = [
1.0 34.6237 78.0247;
1.0 30.2867 43.895;
1.0 35.8474 72.9022;
1.0 60.1826 86.3086;
1.0 79.0327 75.3444
]
Y = [0 0 0 1 1]'
sigmoid(z) = 1 / (1 + e ^ -z)
# Cost function.
function costJ(Theta, X, Y)
m = length(Y)
H = map(z -> sigmoid(z), (Theta' * X')')
sum((-Y)'*log(H) - (1-Y)'*log(1 - H)) / m
end
# Gradient.
function gradient(Theta, X, Y)
m = length(Y)
H = map(z -> sigmoid(z), (Theta' * X')')
(((X'*H - X'*Y)') / m)'
end
# Gradient Descent.
function gradientDescent(X, Y, Theta, alpha, nIterations)
m = length(Y)
jHistory = Array(Float64, nIterations)
for i = 1:nIterations
jHistory[i] = costJ(Theta, X, Y)
Theta = Theta - alpha * gradient(Theta, X, Y)
end
Theta, jHistory
end
gradientDescent(X, Y, [0 0 0]', 0.0001, 1000)
最佳答案
我认为@colinefang的评论可能是正确的诊断。尝试绘制jHistory
-它总是减少吗?
您可以做的另一件事是在每次迭代中添加simple linesearch以确保成本始终降低,例如:
function linesearch(g, X, Y, Theta; alpha=1.0)
init_cost = costJ(Theta, X, Y)
while costJ(Theta - alpha*g, X, Y) > init_cost
alpha = alpha / 2.0 # or divide by some other constant >1
end
return alpha
end
然后稍微修改梯度下降函数以在每次迭代中搜索alpha:
for i = 1:nIterations
g = gradient(Theta, X, Y)
alpha = linesearch(g,X,Y,Theta)
Theta = Theta - alpha * g
end
您可以对上述代码进行各种性能增强。我只是想给你看看味道。
关于machine-learning - 为什么简单的逻辑回归需要数百万次迭代才能收敛?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/36364040/