本文介绍了梯度下降算法将不会收敛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想写出一些关于梯度下降算法的代码,在斯坦福机器学习讲座中解释(。下面是我第一次使用的实现,我认为它正确地复制了从演讲,但是当我添加大数字(> 8 )时不会收敛

I'm trying to write out a bit of code for the gradient descent algorithm explained in the Stanford Machine Learning lecture (lecture 2 at around 25:00). Below is the implementation I used at first, and I think it's properly copied over from the lecture, but it doesn't converge when I add large numbers (>8) to the training set.

我输入一个数字 X X,X)被添加到训练集,所以在这一刻,我只是想让它收敛到 y = ax + b 其中 a = 1 = theta \ [1 \] b = 0 = theta \ [0 \]
训练集是数组 x y ,其中(x [

I'm inputting a number X, and the point (X,X) is added to the training set, so at the moment, I'm only trying to get it to converge to y=ax+b where a=1=theta\[1\] and b=0=theta\[0\].The training set is the array x and y, where (x[i],y[i]) is a point.

void train()
{
    double delta;
    for (int i = 0; i < x.size(); i++)
    {
        delta = y[i]-hypothesis(x[i]);
        theta[1] += alpha*delta*x[i];
        theta[0] += alpha*delta*1;
    }
}

void C_Approx::display()
{
    std::cout<<theta[1]<<"x + "<<theta[0]<<" \t "<<"f(x)="<<hypothesis(1)<<std::endl;
}

我得到的一些结果:
我输入一个数字,它运行 train()几次,然后 display()

some of the results I'm getting:I input a number, it runs train() a few times, then display()

1
0.33616x + 0.33616   f(x)=0.67232
1
0.482408x + 0.482408     f(x)=0.964816
1
0.499381x + 0.499381     f(x)=0.998762
1
0.499993x + 0.499993     f(x)=0.999986
1
0.5x + 0.5   f(x)=1

它通过后发散的示例 8

1
0.33616x + 0.33616   f(x)=0.67232
2
0.705508x + 0.509914     f(x)=1.21542
3
0.850024x + 0.449928     f(x)=1.29995
4
0.936062x + 0.330346     f(x)=1.26641
5
0.951346x + 0.231295     f(x)=1.18264
6
0.992876x + 0.137739     f(x)=1.13062
7
0.932206x + 0.127372     f(x)=1.05958
8
1.00077x + 0.000493063   f(x)=1.00126
9
-0.689325x + -0.0714712      f(x)=-0.760797
10
4.10321e+08x + 4.365e+07     f(x)=4.53971e+08
11
1.79968e+22x + 1.61125e+21   f(x)=1.9608e+22
12
-3.9452e+41x + -3.26957e+40      f(x)=-4.27216e+41

我尝试了解决方案缩放步骤,最后得到类似的结果。
我做错了什么?

I tried the solution proposed here of scaling the step and ended up with similar results.What am I doing wrong?

推荐答案

您的实现是好的。通常,当α太大时,随机梯度下降可能发散。对大型数据集你会做一个合理大小的随机抽样,找到α,给你最好的结果,然后使用它的余下。

Your implementation is good. Generally, stochastic gradient descent might diverge when α is too large. What you would do with a large dataset is take a reasonably sized random sample, find α that gives you the best results, and then use it for the rest.

这篇关于梯度下降算法将不会收敛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 21:44