我在pytorch中有一个简单的模型。
model = Network()
详细信息是:
Network(
(hidden): Linear(in_features=784, out_features=256, bias=True)
(output): Linear(in_features=256, out_features=10, bias=True)
(sigmoid): Sigmoid()
(softmax): Softmax(dim=1)
)
总共有3个神经元层。 1个输入(786个神经元),1个隐藏的(256个神经元)和1个输出(10个神经元)。因此,将有两个重量层。所以两个权重层都必须有两个偏差(仅两个浮点数),对吗? (纠正我,如果我错了)。
现在,在初始化我的网络后,我对这两个偏差值感到好奇。所以我想检查隐藏层的偏差值,所以我写道:
model.hidden.bias
结果是我没有想到的!我实际上期望一个值!这就是我真正得到的:
tensor([-1.6868e-02, -3.5661e-02, 1.2489e-02, -2.7880e-02, 1.4025e-02,
-2.6085e-02, 1.2625e-02, -3.1748e-02, 5.0335e-03, 3.8031e-03,
-3.1648e-02, -3.4881e-02, -2.0026e-02, 1.9728e-02, 6.2461e-03,
9.3936e-04, -5.9270e-03, -2.7183e-02, -1.9850e-02, -3.5693e-02,
-1.9393e-02, 2.6555e-02, 2.3482e-02, 2.1230e-02, -2.2175e-02,
-2.4386e-02, 3.4848e-02, -2.6044e-02, 1.3575e-02, 9.4125e-03,
3.0012e-02, -2.6078e-02, 7.1615e-05, -1.7061e-02, 6.6355e-03,
-3.4966e-02, 2.9311e-02, 1.4060e-02, -2.5763e-02, -1.4020e-02,
2.9852e-02, -7.9176e-03, -1.8396e-02, 1.6927e-02, -1.1001e-03,
1.5595e-02, 1.2169e-02, -1.2275e-02, -2.9270e-03, -6.5685e-04,
-2.4297e-02, 3.0048e-02, 2.9692e-03, -2.5398e-02, 2.9955e-03,
-9.3653e-04, -1.2932e-02, 2.4232e-02, -3.5182e-02, -1.6163e-02,
3.0025e-02, 3.1227e-02, -8.2498e-04, 2.7102e-02, -2.3830e-02,
-3.4958e-02, -1.1886e-02, 1.6097e-02, 1.4579e-02, -2.6744e-02,
1.1900e-02, -3.4855e-02, -4.2208e-03, -5.2035e-03, 1.7055e-02,
-4.8580e-03, 3.4088e-03, 1.6923e-02, 3.5570e-04, -3.0478e-02,
8.4647e-03, 2.5704e-02, -2.3255e-02, 6.9396e-03, -1.2521e-03,
-9.4101e-03, -2.5798e-02, -1.4438e-03, -7.2684e-03, 3.5417e-02,
-3.4388e-02, 1.3706e-02, -5.1430e-03, 1.6174e-02, 1.8135e-03,
-2.9018e-02, -2.9083e-02, 7.4100e-03, -2.7758e-02, 2.4367e-02,
-3.8350e-03, 9.4390e-03, -1.0844e-02, 1.6381e-02, -2.5268e-02,
1.3553e-02, -1.0545e-02, -1.3782e-02, 2.8519e-02, 2.3630e-02,
-1.9703e-02, -2.0147e-02, -1.0485e-02, 2.4637e-02, 1.9989e-02,
5.6601e-03, 1.9121e-02, -1.5286e-02, 2.5996e-02, -2.9833e-02,
-2.9458e-02, 2.3944e-02, -3.0107e-02, -1.2307e-02, -1.8419e-02,
3.3551e-02, 1.2396e-02, 2.9356e-02, 3.3274e-02, 5.4677e-03,
3.1715e-02, 1.3361e-02, 3.3042e-02, 2.7843e-03, 2.2837e-02,
-3.4981e-02, 3.2355e-02, -2.7658e-03, 2.2184e-02, -2.0203e-02,
-3.3264e-02, -3.4858e-02, 1.0820e-03, -1.4279e-02, -2.8041e-02,
4.1962e-03, 2.4266e-02, -3.5704e-02, -2.6172e-02, 2.3335e-02,
2.0657e-02, -3.0387e-03, -5.7096e-03, -1.1062e-02, 1.3450e-02,
-3.3965e-02, 1.9623e-03, -2.0067e-02, -3.3858e-02, -2.1931e-02,
-1.5414e-02, 2.4454e-02, 2.5668e-02, -1.1932e-02, 5.7540e-04,
1.5130e-02, 1.3916e-02, -2.1521e-02, -3.0575e-02, 1.8841e-02,
-2.3240e-02, -2.7297e-02, -3.2668e-02, -1.5544e-02, -5.9408e-03,
3.0241e-02, 2.2039e-02, -2.4389e-02, 3.1703e-02, 3.5305e-02,
-2.7501e-03, 2.0154e-02, -5.3489e-03, 1.4177e-02, 1.6829e-02,
3.3066e-02, -1.3425e-02, -3.2565e-02, 6.5624e-03, -1.5681e-02,
2.3047e-02, 6.5880e-03, -3.3803e-02, 2.3790e-02, -5.5061e-03,
2.9413e-02, 1.2290e-02, -1.0958e-02, 1.2680e-03, 1.3343e-02,
6.6689e-03, -2.2975e-03, -1.2068e-02, 1.6523e-02, -3.1612e-02,
-1.7529e-02, -2.2220e-02, -1.4723e-02, -1.3495e-02, -5.1805e-03,
-2.9620e-02, 3.0571e-02, -3.0999e-02, 3.3681e-03, 1.3579e-02,
1.4837e-02, 1.5694e-02, -1.1178e-02, 4.6233e-03, -2.2583e-02,
-3.5281e-03, 3.0918e-02, 2.6407e-02, 1.5822e-04, -3.0181e-03,
8.6989e-03, 2.8998e-02, -1.5975e-02, -3.1574e-02, -1.5609e-02,
1.0472e-02, 5.8976e-03, 7.0131e-03, -3.2047e-02, 2.6045e-02,
-2.8882e-02, -2.2121e-02, -3.2960e-02, 1.8268e-02, 3.0984e-02,
1.4824e-02, 3.0010e-02, -5.7523e-03, -2.0017e-02, 4.8700e-03,
1.4997e-02, -1.4898e-02, 6.8572e-03, 9.7713e-03, 1.3410e-02,
4.9619e-03, 3.1016e-02, 3.1240e-02, -3.0203e-02, 2.1435e-02,
2.7331e-02], requires_grad=True)
有人可以向我解释这种行为吗?为什么我得到256个值而不是一个?
编辑1:
这是我对图层的理解:
对于整个神经元层,偏差只是一个值。我对吗?但是我看到的输出是256个值?为什么呢? pytorch是否假设我对每个神经元都有偏见?这样可以吗?
最佳答案
因此,首先重要的是要意识到这些层之一内部正在发生的事情。当你写:
Linear(in_features=784, out_features=256, bias=True)
您正在建模输入和输出之间的线性关系。您可能对基本数学很熟悉:
Y = MX + B
但是,您具有权重矩阵和偏差项,而不是“斜率”和“ y截距”。这仍然是线性关系,但是矩阵是我们的输入和输出。
Y是我们的输出,M是我们的权重矩阵,X是我们的输入,B是我们的偏差。您定义输入为(N x 784)矩阵,而我们的输出为(N x 256)矩阵(N为样本数)。
如果您熟悉矩阵乘法,则意味着我们的权重矩阵为(784 X 256)。 MX的输出将是(N x 256)矩阵,因此我们的偏差项也必须是(N x 256)才能计算出MX +B。
通常,偏差项中的值的数量将与
out_features
的数量相同。关于python - 对于神经网络(火炬)中的每一层,应该有多少偏差?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/57836679/