python - 梯度下降是发散的

我正在关注Andrew Ng的Coursera课程，并且我尝试使用我相信他也在幻灯片中使用过的房屋数据（可以在here中找到）来编写梯度下降的基本python实现。我没有使用numpy或scikit learning或任何东西，我只是在尝试使代码使用1D输入和输出，其行格式为theta0 + theta1 * x（2个变量）。我的代码非常简单，但是即使我提高或降低学习率或让它运行更多的迭代，它仍然设法发散。我查看了并尝试了其他多个公式，但仍然有所不同。我确保数据正确加载。这是代码：

dataset_f = open("housing_prices.csv", "r")

dataset = dataset_f.read().split("\n")

xs = []
ys = []

for line in dataset:
    split = line.split(",")
    xs.append(int(split[0]))
    ys.append(int(split[2]))

m = float(len(xs))

learning_rate = 1e-5

theta0 = 0
theta1 = 0

n_steps = 1


def converged():
    return n_steps > 1000


while not converged():
    print("Step #" + str(n_steps))
    print("θ Naught: {}".format(theta0))
    print("θ One: {}".format(theta1))

    theta0_gradient = (1.0 / m) * sum([(theta0 + theta1 * xs[i] - ys[i]) for i in range(int(m))])
    theta1_gradient = (1.0 / m) * sum([(theta0 + theta1 * xs[i] - ys[i]) * xs[i] for i in range(int(m))])

    theta0_temp = theta0 - learning_rate * theta0_gradient
    theta1_temp = theta1 - learning_rate * theta1_gradient

    theta0 = theta0_temp
    theta1 = theta1_temp

    n_steps += 1

print(theta0)
print(theta1)

Theta一无所有，一个很快成为nan，因为它们达到无穷大。我确实注意到的是，零和零都在正负之间振荡，并且变得越来越大。例如：

Step #1
θ Naught: 0
θ One: 0

Step #2
θ Naught: 3.4041265957446813
θ One: 7642.091281914894

Step #3
θ Naught: -146.0856377478662
θ One: -337844.5760108272

Step #4
θ Naught: 6616.511688310662
θ One: 15281052.424862152

Step #5
θ Naught: -299105.2400554526
θ One: -690824180.132845

Step #6
θ Naught: 13522088.241560074
θ One: 31231058614.54401

Step #7
θ Naught: -611311852.8608981
θ One: -1411905961438.4395

Step #8
θ Naught: 27636426469.18927
θ One: 63829999475126.086

Step #9
θ Naught: -1249398426624.6619
θ One: -2885651696197370.0

Step #10
θ Naught: 56483294981582.41
θ One: 1.304556757051869e+17

Step #11
θ Naught: -2553518992810967.5
θ One: -5.89769144561785e+18

Step #12
θ Naught: 1.1544048994968486e+17
θ One: 2.6662515218056607e+20

Step #13
θ Naught: -5.218879028251596e+18
θ One: -1.2053694641507752e+22

最佳答案

我已经对您的代码进行了一些小的更改。忽略我拥有的进口商品，纯粹是出于我自己的绘图目的。这应该使用您的新数据集。主要的变化只是调整学习率，并删除了一些不必要的演员表。

import matplotlib.pyplot as plt
import numpy as np

dataset_f = open("actual_housing_prices.csv", "r")

dataset = dataset_f.read().split("\n")

xs = []
ys = []

for line in dataset:
    split = line.split(",")
    xs.append(int(split[0]))
    ys.append(int(split[2]))

m = len(xs)

learning_rate1 = 1e-7
learning_rate2 = 1e-3

theta0 = 0
theta1 = 0

n_steps = 1


def converged():
    return n_steps > 100000


while not converged():
    print("Step #" + str(n_steps))
    print("Theta Naught: {}".format(theta0))
    print("Theta One: {}".format(theta1))

    theta0_gradient = (1.0 / m) * sum([theta0 + theta1*xs[i] - ys[i] for i in range(m)])
    theta1_gradient = (1.0 / m) * sum([(theta0 + theta1*xs[i] - ys[i])* xs[i] for i in range(m)])

    theta0_temp = theta0 - learning_rate2 * theta0_gradient
    theta1_temp = theta1 - learning_rate1 * theta1_gradient

    theta0 = theta0_temp
    theta1 = theta1_temp

    n_steps += 1

print(theta0)
print(theta1)

print("Error: {}".format(sum([ys[i]-theta0+theta1*xs[i] for i in range(m)])))
plt.plot(xs, ys, 'ro')
plt.axis([0, max(xs), 0, max(ys)])
my_vals = list(np.arange(0, max(xs), 0.02))
plt.plot(my_vals, map(lambda q: theta0+theta1*q, my_vals), '-bo')
plt.show()

这是使用两个优化权重的结果行：

关于python - 梯度下降是发散的，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/46086193/