本文介绍了神经网络XOR门未学习的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过使用2个感知器网络来制作XOR门,但由于某种原因该网络无法学习,当我在图形中绘制误差变化时,误差达到静态水平并在该区域振荡.

I'm trying to make a XOR gate by using 2 perceptron network but for some reason the network is not learning, when I plot the change of error in a graph the error comes to a static level and oscillates in that region.

此刻我没有给网络增加任何偏见.

I did not add any bias to the network at the moment.

import numpy as np

def S(x):
    return 1/(1+np.exp(-x))

win = np.random.randn(2,2)
wout = np.random.randn(2,1)
eta = 0.15

# win = [[1,1], [2,2]]
# wout = [[1],[2]]

obj = [[0,0],[1,0],[0,1],[1,1]]
target = [0,1,1,0]

epoch = int(10000)
emajor = ""

for r in range(0,epoch):
    for xy in range(len(target)):
        tar = target[xy]
        fdata = obj[xy]

        fdata = S(np.dot(1,fdata))

        hnw = np.dot(fdata,win)

        hnw = S(np.dot(fdata,win))

        out = np.dot(hnw,wout)

        out = S(out)

        diff = tar-out

        E = 0.5 * np.power(diff,2)
        emajor += str(E[0]) + ",\n"

        delta_out = (out-tar)*(out*(1-out))
        nindelta_out = delta_out * eta

        wout_change = np.dot(nindelta_out[0], hnw)

        for x in range(len(wout_change)):
            change = wout_change[x]
            wout[x] -= change

        delta_in = np.dot(hnw,(1-hnw)) * np.dot(delta_out[0], wout)
        nindelta_in = eta * delta_in

        for x in range(len(nindelta_in)):
            midway = np.dot(nindelta_in[x][0], fdata)
            for y in range(len(win)):
                win[y][x] -= midway[y]



f = open('xor.csv','w')
f.write(emajor) # python will convert \n to os.linesep
f.close() # you can omit in most cases as the destructor will call it

这是根据学习轮次而变化的错误.这样对吗?红色线是我期待的错误应该如何改变的线.

This is the error changing by the number of learning rounds. Is this correct? The red color line is the line I was expecting how the error should change.

我在代码中做错了什么?由于我似乎无法弄清楚是什么原因引起的错误.帮助非常感谢.

Anything wrong I'm doing in the code? As I can't seem to figure out what's causing the error. Help much appreciated.

预先感谢

推荐答案

这里是具有反向传播功能的一个隐藏层网络,可以对其进行自定义以运行具有relu,Sigmoid和其他激活的实验.经过几次实验,得出的结论是,使用relu时,网络性能更好,并且更快地达到收敛;而使用S型网络时,损耗值则发生波动.发生这种情况是因为"随着x的绝对值增加,S形坡度将变得越来越小".

Here is a one hidden layer network with backpropagation which can be customized to run experiments with relu, sigmoid and other activations. After several experiments it was concluded that with relu the network performed better and reached convergence sooner, while with sigmoid the loss value fluctuated. This happens because, "the gradient of sigmoids becomes increasingly small as the absolute value of x increases".

import numpy as np
import matplotlib.pyplot as plt
from operator import xor

class neuralNetwork():
    def __init__(self):
        # Define hyperparameters
        self.noOfInputLayers = 2
        self.noOfOutputLayers = 1
        self.noOfHiddenLayerNeurons = 2

        # Define weights
        self.W1 = np.random.rand(self.noOfInputLayers,self.noOfHiddenLayerNeurons)
        self.W2 = np.random.rand(self.noOfHiddenLayerNeurons,self.noOfOutputLayers)

    def relu(self,z):
        return np.maximum(0,z)

    def sigmoid(self,z):
        return 1/(1+np.exp(-z))

    def forward (self,X):
        self.z2 = np.dot(X,self.W1)
        self.a2 = self.relu(self.z2)
        self.z3 = np.dot(self.a2,self.W2)
        yHat = self.relu(self.z3)
        return yHat

    def costFunction(self, X, y):
        #Compute cost for given X,y, use weights already stored in class.
        self.yHat = self.forward(X)
        J = 0.5*sum((y-self.yHat)**2)
        return J

    def costFunctionPrime(self,X,y):
        # Compute derivative with respect to W1 and W2
        delta3 = np.multiply(-(y-self.yHat),self.sigmoid(self.z3))
        djw2 = np.dot(self.a2.T, delta3)
        delta2 = np.dot(delta3,self.W2.T)*self.sigmoid(self.z2)
        djw1 = np.dot(X.T,delta2)

        return djw1,djw2


if __name__ == "__main__":

    EPOCHS = 6000
    SCALAR = 0.01

    nn= neuralNetwork()
    COST_LIST = []

    inputs = [ np.array([[0,0]]), np.array([[0,1]]), np.array([[1,0]]), np.array([[1,1]])]

    for epoch in xrange(1,EPOCHS):
        cost = 0
        for i in inputs:
            X = i #inputs
            y = xor(X[0][0],X[0][1])
            cost += nn.costFunction(X,y)[0]
            djw1,djw2 = nn.costFunctionPrime(X,y)
            nn.W1 = nn.W1 - SCALAR*djw1
            nn.W2 = nn.W2 - SCALAR*djw2
        COST_LIST.append(cost)

    plt.plot(np.arange(1,EPOCHS),COST_LIST)
    plt.ylim(0,1)
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.title(str('Epochs: '+str(EPOCHS)+', Scalar: '+str(SCALAR)))
    plt.show()

    inputs = [ np.array([[0,0]]), np.array([[0,1]]), np.array([[1,0]]), np.array([[1,1]])]
    print "X\ty\ty_hat"
    for inp in inputs:
        print (inp[0][0],inp[0][1]),"\t",xor(inp[0][0],inp[0][1]),"\t",round(nn.forward(inp)[0][0],4)

最终结果:

X       y       y_hat
(0, 0)  0       0.0
(0, 1)  1       0.9997
(1, 0)  1       0.9997
(1, 1)  0       0.0005

训练后获得的权重为:

nn.w1

[ [-0.81781753  0.71323677]
  [ 0.48803631 -0.71286155] ]

nn.w2

[ [ 2.04849235]
  [ 1.40170791] ]

我发现以下youtube系列视频对于理解神经网络非常有帮助:神经网络变得神秘化

I found the following youtube series extremely helpful for understanding neural nets: Neural networks demystified

我所知道的很少,而且在这个答案中也可以解释.如果您想更好地了解神经网络,那么我建议您通过以下链接: cs231n:为一个神经元建模

There is only little which I know and also that can be explained in this answer. If you want an even better understanding of neural nets, then I would suggest you to go through the following link: cs231n: Modelling one neuron

这篇关于神经网络XOR门未学习的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-16 15:37
查看更多