1.线性回归损失函数推导

1.1 相关函数

  • 线性回归函数
    y ( i ) = θ T X (1) \begin{aligned} y_{(i)} &= \theta^TX\tag{1} \end{aligned} y(i)=θTX(1)
  • 高斯分布
    f ( x ) = 1 2 π σ e − ( x − u ) 2 2 σ 2 (2) \begin{aligned} f{(x)} &= \frac{1}{\sqrt{2 \pi}\sigma}e^{-\frac{(x- u )^2}{2\sigma^2}}\tag{2} \end{aligned} f(x)=2π σ1e2σ2(xu)2(2)
  • 联合概率密度
    • 如果两随机变量互相独立,则联合密度函数等于边缘分布函数的乘积
      P ( A B ) = P ( A ) ∗ P ( B ) \begin{aligned} P{(AB)} &= P{(A)}*P{(B)} \end{aligned} P(AB)=P(A)P(B)

1.2 损失函数推导

  • 已知:线性回归误差 ϵ \epsilon ϵ服从均值为0,方差为 σ 2 \sigma^2 σ2的高斯分布
  • 最大可能性估计,我们要求误差最大似然估计,求最大值
    L θ { ϵ 1 , ϵ 2 , ϵ 3 . . . ϵ M } = f ( ϵ 1 , ϵ 2 , ϵ 3 . . . ϵ m ∣ u , σ 2 ) = f ( ϵ 1 ∣ u , σ 2 ) ∗ f ( ϵ 2 ∣ u , σ 2 ) … f ( ϵ m ∣ u , σ 2 ) = ∏ i = 1 m f ( ϵ i ∣ u , σ 2 ) = ∏ i = 1 m 1 2 π σ ∗ e − ( y i − θ T X i ) 2 2 σ 2 (带入1,2) \begin{aligned} L_\theta{ \{ \epsilon_1,\epsilon_2,\epsilon_3...\epsilon_M \}} &=f{(\epsilon_1,\epsilon_2,\epsilon_3...\epsilon_m|u,\sigma^2)} \\&=f{(\epsilon_1|u,\sigma^2)}*f{(\epsilon_2|u,\sigma^2)} \dots f{(\epsilon_m|u,\sigma^2)} \\&=\prod_{i=1}^{m}f{(\epsilon_i|u,\sigma^2)} \tag{带入1,2} \\&=\prod_{i=1}^{m} \frac{1}{\sqrt{2 \pi}\sigma}*e^{-\frac{(y^i-\theta^TX^i)^2}{2\sigma^2}} \end{aligned} Lθ{ϵ1,ϵ2,ϵ3...ϵM}=f(ϵ1,ϵ2,ϵ3...ϵmu,σ2)=f(ϵ1u,σ2)f(ϵ2u,σ2)f(ϵmu,σ2)=i=1mf(ϵiu,σ2)=i=1m2π σ1e2σ2(yiθTXi)2(1,2)
  • 对公式取对数
      l o g e L θ { ϵ 1 , ϵ 2 , ϵ 3 . . . ϵ M } = log ⁡ e ∏ i = 1 m 1 2 π σ ∗ e − ( y i − θ T X i ) 2 2 σ 2 = ∑ i = 1 m log ⁡ e 1 2 π σ e − ( y i − θ T X i ) 2 2 σ 2 = ∑ i = 1 m log ⁡ e 1 2 π σ + ∑ i = 1 m log ⁡ e e − ( y i − θ T X i ) 2 2 σ 2 = m log ⁡ e 1 2 π σ − m ( y i − θ T X i ) 2 2 σ 2 \begin{aligned} \ log_e ^{L_\theta{ \{ \epsilon_1,\epsilon_2,\epsilon_3...\epsilon_M \}}}&=\log_{e}^{ \prod_{i=1}^{m}\frac{1}{\sqrt{2 \pi}\sigma}*e^{-\frac{(y^i-\theta^T X^i)^2}{2\sigma^2}}} \\&=\sum_{i=1}^{m} \log_e^{ \frac{1} {\sqrt{2 \pi} \sigma}e^{-\frac{(y^i-\theta^T X^i)^2}{2\sigma^2}}} \\&=\sum_{i=1}^{m} \log_e^{ \frac{1} {\sqrt{2 \pi} \sigma}}+\sum_{i=1}^{m} \log_e^{ e^{-\frac{(y^i-\theta^T X^i)^2}{2\sigma^2}}} \\&=m\log_e^{ \frac{1}{\sqrt{2 \pi}\sigma}}-m \frac{(y^i-\theta^T X^i)^2}{2\sigma^2} \end{aligned}  logeLθ{ϵ1,ϵ2,ϵ3...ϵM}=logei=1m2π σ1e2σ2(yiθTXi)2=i=1mloge2π σ1e2σ2(yiθTXi)2=i=1mloge2π σ1+i=1mlogee2σ2(yiθTXi)2=mloge2π σ1m2σ2(yiθTXi)2
  • 已知以下值是确定的
    m log ⁡ e 1 2 π σ , − m 2 σ 2 \begin{aligned} m\log_e^{ \frac{1}{\sqrt{2 \pi}\sigma}}, -\frac{m}{2\sigma^2} \end{aligned} mloge2π σ1,2σ2m
  • 即求 L θ { ϵ 1 , ϵ 2 , ϵ 3 . . . ϵ M } L_\theta{ \{ \epsilon_1,\epsilon_2,\epsilon_3...\epsilon_M \}} Lθ{ϵ1,ϵ2,ϵ3...ϵM},最大值,就是求 ( y i − θ T X i − b ) 2 2 σ 2 \frac{(y^i-\theta^T X^i-b)^2}{2\sigma^2} 2σ2(yiθTXib)2的最小值,就是求 ( y i − θ T X i − b ) 2 (y^i-\theta^TX^i-b)^2 (yiθTXib)2的最小值

1.3 总结

  • 线性回归损失函数为【基于均方误差MSE】
    L o s s θ = 1 2 m ∑ i = 1 m ( θ T X i + b − y i ) 2 = 1 2 m ∑ i = 1 m ( θ T X i − y i ) ( θ T X i − y i ) = 1 2 ( θ T X − y ) T ( θ T X − y ) = 1 2 ( X θ − y ) T ( X θ − y ) = 1 2 ( θ T X T − y T ) ( X θ − y ) = 1 2 ( θ T X T X θ − θ T X T y − y T X θ + y T y ) \begin{aligned} Loss_\theta& =\frac{1}{2m}\sum_{i=1}^{m}(\theta^T X^i+b-y^i)^2 \\ & =\frac{1}{2m}\sum_{i=1}^{m}(\theta^T X^i-y^i)(\theta^T X^i-y^i) \\ & =\frac{1}{2}(\theta^T X-y)^T(\theta^T X-y) \\ & =\frac{1}{2}(X\theta -y)^T(X\theta-y) \\&=\frac{1}{2} ( \theta^T X^T-y^T)(X\theta-y) \\&=\frac{1}{2} (\theta^T X^TX\theta-\theta^TX^Ty-y^TX\theta+y^Ty) \end{aligned} Lossθ=2m1i=1m(θTXi+byi)2=2m1i=1m(θTXiyi)(θTXiyi)=21(θTXy)T(θTXy)=21(Xθy)T(Xθy)=21(θTXTyT)(Xθy)=21(θTXTXθθTXTyyTXθ+yTy)

2.损失函数求解推导

2.1 矩阵运算法则

( K A ) T = K A T ( A + B ) T = A T + B T ( A B ) T = B T A T ( A T ) T = A ∂ θ T A θ ∂ θ = 2 A θ ∂ θ T A ∂ θ = A ∂ A θ ∂ θ = A T (KA)^T = KA^T \\ (A+B)^T = A^T+B^T \\(AB)^T = B^TA^T \\(A^T)^T = A \\ \frac{\partial \theta^TA\theta}{\partial \theta} =2 A\theta \\ \frac{\partial \theta^TA}{\partial \theta} =A \\ \frac{\partial A\theta}{\partial \theta} =A^T (KA)T=KAT(A+B)T=AT+BT(AB)T=BTAT(AT)T=AθθTAθ=2AθθθTA=AθAθ=AT

2.2 损失函数求解【求导】

L o s s θ ∂ θ = 1 2 ( X T θ T θ X − X T θ T y − y T θ X + y T y ) ∂ θ = 1 2 ( X T θ T X T θ − X T θ T y − y T θ X + y T y ) ∂ θ = 1 2 ( X T X θ − X T θ T y − y T θ X + y T y ) \begin{aligned} \frac{Loss_{\theta}}{\partial \theta} &=\frac{\frac{1}{2} ( X^T\theta^T\theta X-X^T\theta^Ty-y^T\theta X+y^Ty)}{\partial \theta} \\&=\frac{\frac{1}{2} ( X^T\theta^TX^T\theta -X^T\theta^Ty-y^T\theta X+y^Ty)}{\partial \theta} \\&= \frac{1}{2} ( X^TX\theta -X^T\theta^Ty-y^T\theta X+y^Ty) \end{aligned} θLossθ=θ21(XTθTθXXTθTyyTθX+yTy)=θ21(XTθTXTθXTθTyyTθX+yTy)=21(XTXθXTθTyyTθX+yTy)

11-30 11:36