数据挖掘与分析课程笔记

  • 参考教材:Data Mining and Analysis : MOHAMMED J.ZAKI, WAGNER MEIRA JR.

文章目录

  1. 数据挖掘与分析课程笔记(目录)
  2. 数据挖掘与分析课程笔记(Chapter 1)
  3. 数据挖掘与分析课程笔记(Chapter 2)
  4. 数据挖掘与分析课程笔记(Chapter 5)
  5. 数据挖掘与分析课程笔记(Chapter 7)
  6. 数据挖掘与分析课程笔记(Chapter 14)
  7. 数据挖掘与分析课程笔记(Chapter 15)
  8. 数据挖掘与分析课程笔记(Chapter 20)
  9. 数据挖掘与分析课程笔记(Chapter 21)


Chapter 5 Kernel Method:核方法

Example 5.1 略, ϕ ( 核映射 ) : Σ ∗ ( 输入空间 ) → R 4 ( 特征空间 ) \phi(核映射):\Sigma^*(输入空间)\to \mathbb{R}^4(特征空间) ϕ(核映射):Σ(输入空间)R4(特征空间)

Def.1. 假设核映射 ϕ : I → F \phi:\mathcal{I}\to \mathcal{F} ϕ:IF ϕ \phi ϕ 的核函数是指 K : I × I → R K:\mathcal{I}\times\mathcal{I}\to \mathbb{R} K:I×IR 使得 ∀ ( x i , x j ) ∈ I × I , K ( x i , x j ) = ϕ T ( x i ) ϕ ( x j ) \forall (\mathbf{x}_i,\mathbf{x}_j)\in \mathcal{I}\times\mathcal{I},K(\mathbf{x}_i,\mathbf{x}_j)=\phi^T(\mathbf{x}_i)\phi(\mathbf{x}_j) (xi,xj)I×I,K(xi,xj)=ϕT(xi)ϕ(xj)

Example 5.2 ϕ : R 2 → R 3 \phi:\mathbb{R}^2\to \mathbb{R}^3 ϕ:R2R3 使得 ∀ a = ( a 1 , a 2 ) , ϕ ( a ) = ( a 1 2 , a 2 2 , 2 a 1 a 2 ) T \forall \mathbf{a}=(a_1,a_2),\phi(\mathbf{a})=(a_1^2,a_2^2,\sqrt2a_1a_2)^T a=(a1a2),ϕ(a)=(a12,a22,2 a1a2)T

注意到 K ( a , b ) = ϕ ( a ) T ϕ ( b ) = a 1 2 b 1 2 + a 2 2 b 2 2 + 2 a 1 2 a 2 2 b 1 2 b 2 2 K(\mathbf{a},\mathbf{b})=\phi(\mathbf{a})^T\phi(\mathbf{b})=a_1^2b_1^2+a_2^2b_2^2+2a_1^2a_2^2b_1^2b_2^2 K(a,b)=ϕ(a)Tϕ(b)=a12b12+a22b22+2a12a22b12b22 K : R 2 × R 2 → R K:\mathbb{R}^2\times\mathbb{R}^2\to \mathbb{R} K:R2×R2R

Remark

  1. 分析复杂数据
  2. 分析非线性特征(知乎搜核函数有什么作用)

Goal:在未知 ϕ \phi ϕ 的情况下,通过分析 K K K 来分析特征空间 F \mathcal{F} F 结果。

5.1 核矩阵

D = { x 1 , x 2 , … , x n } ⊂ I \mathbf{D}=\left\{\mathbf{x}_{1}, \mathbf{x}_{2}, \ldots, \mathbf{x}_{n}\right\} \subset \mathcal{I} D={x1,x2,,xn}I,其核矩阵定义为: K = [ K ( x i , x j ) ] n × n \mathbf{K}=[K(\mathbf{x}_{i},\mathbf{x}_{j})]_{n\times n} K=[K(xi,xj)]n×n

Prop. 核矩阵 K \mathbf{K} K 是对称的且半正定的

Proof. K ( x i , x j ) = ϕ T ( x i ) ϕ ( x j ) = ϕ T ( x j ) ϕ ( x i ) = K ( x j , x i ) K(\mathbf{x}_{i},\mathbf{x}_{j})=\phi^T(\mathbf{x}_i)\phi(\mathbf{x}_j)=\phi^T(\mathbf{x}_j)\phi(\mathbf{x}_i)=K(\mathbf{x}_{j},\mathbf{x}_{i}) K(xi,xj)=ϕT(xi)ϕ(xj)=ϕT(xj)ϕ(xi)=K(xj,xi),故对称。

对于 ∀ a T ∈ R n \forall \mathbf{a}^{T}\in \mathbb{R}^n aTRn
a T K a = ∑ i = 1 n ∑ j = 1 n a i a j K ( x i , x j ) = ∑ i = 1 n ∑ j = 1 n a i a j ϕ ( x i ) T ϕ ( x j ) = ( ∑ i = 1 n a i ϕ ( x i ) ) T ( ∑ j = 1 n a j ϕ ( x j ) ) = ∥ ∑ i = 1 n a i ϕ ( x i ) ∥ 2 ≥ 0 \begin{aligned} \mathbf{a}^{T} \mathbf{K a} &=\sum_{i=1}^{n} \sum_{j=1}^{n} a_{i} a_{j} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \\ &=\sum_{i=1}^{n} \sum_{j=1}^{n} a_{i} a_{j} \phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right) \\ &=\left(\sum_{i=1}^{n} a_{i} \phi\left(\mathbf{x}_{i}\right)\right)^{T}\left(\sum_{j=1}^{n} a_{j} \phi\left(\mathbf{x}_{j}\right)\right) \\ &=\left\|\sum_{i=1}^{n} a_{i} \phi\left(\mathbf{x}_{i}\right)\right\|^2 \geq 0 \end{aligned} aTKa=i=1nj=1naiajK(xi,xj)=i=1nj=1naiajϕ(xi)Tϕ(xj)=(i=1naiϕ(xi))T(j=1najϕ(xj))= i=1naiϕ(xi) 20

5.1.1 核映射的重构

”经验核映射“

已知 D = { x i } i = 1 n ⊂ I \mathbf{D}=\left\{\mathbf{x}_{i}\right\}_{i=1}^{n} \subset \mathcal{I} D={xi}i=1nI 与核矩阵 K \mathbf{K} K

目标:寻找 ϕ : I → F ⊂ R n \phi:\mathcal{I} \to \mathcal{F} \subset \mathbb{R}^n ϕ:IFRn

首先尝试: ∀ x ∈ I , ϕ ( x ) = ( K ( x 1 , x ) , K ( x 2 , x ) , … , K ( x n , x ) ) T ∈ R n \forall \mathbf{x} \in \mathcal{I},\phi(\mathbf{x})=\left(K\left(\mathbf{x}_{1}, \mathbf{x}\right), K\left(\mathbf{x}_{2}, \mathbf{x}\right), \ldots, K\left(\mathbf{x}_{n}, \mathbf{x}\right)\right)^{T} \in \mathbb{R}^{n} xI,ϕ(x)=(K(x1,x),K(x2,x),,K(xn,x))TRn

检查: ϕ T ( x i ) ϕ ( x j ) ? = K ( x i , x j ) \phi^T(\mathbf{x}_i)\phi(\mathbf{x}_j)?=K(\mathbf{x}_{i},\mathbf{x}_{j}) ϕT(xi)ϕ(xj)?=K(xi,xj)

左边 = ϕ ( x i ) T ϕ ( x j ) = ∑ k = 1 n K ( x k , x i ) K ( x k , x j ) = K i T K j =\phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right)=\sum\limits_{k=1}^{n} K\left(\mathbf{x}_{k}, \mathbf{x}_{i}\right) K\left(\mathbf{x}_{k}, \mathbf{x}_{j}\right)=\mathbf{K}_{i}^{T} \mathbf{K}_{j} =ϕ(xi)Tϕ(xj)=k=1nK(xk,xi)K(xk,xj)=KiTKj K i \mathbf{K}_{i} Ki 代表第 i i i 行或列要求太高。

考虑改进:寻找矩阵 A \mathbf{A} A 使得, K i T A K j = K ( x i , x j ) \mathbf{K}_{i}^{T} \mathbf{A} \mathbf{K}_{j}=\mathbf{K}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) KiTAKj=K(xi,xj),即 K T A K = K \mathbf{K}^{T} \mathbf{A} \mathbf{K}=\mathbf{K} KTAK=K

故只需取 A = K − 1 \mathbf{A}=\mathbf{K}^{-1} A=K1 即可( K \mathbf{K} K 可逆)

K \mathbf{K} K 正定, K − 1 \mathbf{K}^{-1} K1 也正定,即存在一个实矩阵 B \mathbf{B} B 满足 K − 1 = B T B \mathbf{K}^{-1}=\mathbf{B}^{T}\mathbf{B} K1=BTB

故经验核函数可定义为:
ϕ ( x ) = B ⋅ ( K ( x 1 , x ) , K ( x 2 , x ) , … , K ( x n , x ) ) T \phi(\mathbf{x})=\mathbf{B}\cdot\left(K\left(\mathbf{x}_{1}, \mathbf{x}\right), K\left(\mathbf{x}_{2}, \mathbf{x}\right), \ldots, K\left(\mathbf{x}_{n}, \mathbf{x}\right)\right)^{T} ϕ(x)=B(K(x1,x),K(x2,x),,K(xn,x))T
检查: ϕ T ( x i ) ϕ ( x j ) = ( B K i ) T ( B K j ) = K i T K − 1 K j = ( K T K − 1 K ) i , j = K ( x i , x j ) \phi^T(\mathbf{x}_i)\phi(\mathbf{x}_j)=(\mathbf{B}\mathbf{K}_i)^T(\mathbf{B}\mathbf{K}_j)=\mathbf{K}_i^T\mathbf{K}^{-1}\mathbf{K}_j=(\mathbf{K}^T\mathbf{K}^{-1}\mathbf{K})_{i,j}=K(\mathbf{x}_{i},\mathbf{x}_{j}) ϕT(xi)ϕ(xj)=(BKi)T(BKj)=KiTK1Kj=(KTK1K)i,j=K(xi,xj)

5.1.2 特定数据的海塞核映射

对于对称半正定矩阵 K n × n \mathbf{K}_{n\times n} Kn×n,存在分解
K = U ( λ 1 0 ⋯ 0 0 λ 2 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ λ n ) U T = U Λ U T \mathbf{K}=\mathbf{U}\left(\begin{array}{cccc} \lambda_{1} & 0 & \cdots & 0 \\ 0 & \lambda_{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_{n} \end{array}\right)\mathbf{U}^{T}=\mathbf{U}\boldsymbol{\Lambda}\mathbf{U}^{T} K=U λ1000λ2000λn UT=UΛUT
λ i \lambda_{i} λi 为特征值, U = ( ∣ ∣ ∣ u 1 u 2 ⋯ u n ∣ ∣ ∣ ) \mathbf{U}=\left(\begin{array}{cccc} \mid & \mid & & \mid \\ \mathbf{u}_{1} & \mathbf{u}_{2} & \cdots & \mathbf{u}_{n} \\ \mid & \mid & & \mid \end{array}\right) U= u1u2un 为单位正交矩阵, u i = ( u i 1 , u i 2 , … , u i n ) T ∈ R n \mathbf{u}_{i}=\left(u_{i 1}, u_{i 2}, \ldots, u_{i n}\right)^{T} \in \mathbb{R}^{n} ui=(ui1,ui2,,uin)TRn 为特征向量,即
K = λ 1 u 1 u 1 T + λ 2 u 2 u 2 T + ⋯ + λ n u n u n T K ( x i , x j ) = λ 1 u 1 i u 1 j + λ 2 u 2 i u 2 j ⋯ + λ n u n i u n j = ∑ k = 1 n λ k u k i u k j \mathbf{K}=\lambda_{1} \mathbf{u}_{1} \mathbf{u}_{1}^{T}+\lambda_{2} \mathbf{u}_{2} \mathbf{u}_{2}^{T}+\cdots+\lambda_{n} \mathbf{u}_{n} \mathbf{u}_{n}^{T}\\ \begin{aligned} \mathbf{K}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) &=\lambda_{1} u_{1 i} u_{1 j}+\lambda_{2} u_{2 i} u_{2 j} \cdots+\lambda_{n} u_{n i} u_{n j} \\ &=\sum_{k=1}^{n} \lambda_{k} u_{k i} u_{k j} \end{aligned} K=λ1u1u1T+λ2u2u2T++λnununTK(xi,xj)=λ1u1iu1j+λ2u2iu2j+λnuniunj=k=1nλkukiukj
定义海塞映射:
∀ x i ∈ D , ϕ ( x i ) = ( λ 1 u 1 i , λ 2 u 2 i , … , λ n u n i ) T \forall \mathbf{x}_i \in \mathbf {D}, \phi\left(\mathbf{x}_{i}\right)=\left(\sqrt{\lambda_{1}} u_{1 i}, \sqrt{\lambda_{2}} u_{2 i}, \ldots, \sqrt{\lambda_{n}} u_{n i}\right)^{T} xiD,ϕ(xi)=(λ1 u1i,λ2 u2i,,λn uni)T
检查:
ϕ ( x i ) T ϕ ( x j ) = ( λ 1 u 1 i , … , λ n u n i ) ( λ 1 u 1 j , … , λ n u n j ) T = λ 1 u 1 i u 1 j + ⋯ + λ n u n i u n j = K ( x i , x j ) \begin{aligned} \phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right) &=\left(\sqrt{\lambda_{1}} u_{1 i}, \ldots, \sqrt{\lambda_{n}} u_{n i}\right)\left(\sqrt{\lambda_{1}} u_{1 j}, \ldots, \sqrt{\lambda_{n}} u_{n j}\right)^{T} \\ &=\lambda_{1} u_{1 i} u_{1 j}+\cdots+\lambda_{n} u_{n i} u_{n j}=K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \end{aligned} ϕ(xi)Tϕ(xj)=(λ1 u1i,,λn uni)(λ1 u1j,,λn unj)T=λ1u1iu1j++λnuniunj=K(xi,xj)
注意:海塞映射中仅对 D \mathbf{D} D 中的数 x i \mathbf{x}_i xi 有定义。

5.2 向量核函数

R d × R d → R \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R} Rd×RdR

典型向量核函数:多项式核

∀ x , y ∈ R d , K q ( x , y ) = ϕ ( x ) T ϕ ( y ) = ( x T y + c ) q \forall \mathbf{x},\mathbf{y} \in \mathbb {R}^d, K_{q}(\mathbf{x}, \mathbf{y})=\phi(\mathbf{x})^{T} \phi(\mathbf{y})=\left(\mathbf{x}^{T} \mathbf{y} + c \right)^{q} x,yRd,Kq(x,y)=ϕ(x)Tϕ(y)=(xTy+c)q,其中 c ≥ 0 c\ge 0 c0

c = 0 c=0 c=0,齐次,否则为非齐次。

问题:构造核映射 ϕ : R d → F \phi:\mathbb{R}^d \to \mathcal{F} ϕ:RdF,使得 K q ( x , y ) = ϕ ( x ) T ϕ ( y ) K_{q}(\mathbf{x}, \mathbf{y})=\phi(\mathbf{x})^{T} \phi(\mathbf{y}) Kq(x,y)=ϕ(x)Tϕ(y)

注: q = 1 , c = 0 , ϕ ( x ) = x q=1,c=0, \phi (\mathbf{x})=\mathbf{x} q=1,c=0,ϕ(x)=x

示例: q = 2 , d = 2 q=2,d=2 q=2,d=2

高斯核自读

5.3 特征空间中基本核运算

ϕ : I → F , K : I × I → R \phi:\mathcal{I} \to \mathcal{F}, K:\mathcal{I} \times \mathcal{I}\to \mathbb{R} ϕ:IF,K:I×IR

  • 向量长度: ∥ ϕ ( x ) ∥ 2 = ϕ ( x ) T ϕ ( x ) = K ( x , x ) \|\phi(\mathbf{x})\|^{2}=\phi(\mathbf{x})^{T} \phi(\mathbf{x})=K(\mathbf{x}, \mathbf{x}) ϕ(x)2=ϕ(x)Tϕ(x)=K(x,x)

  • 距离:
    ∥ ϕ ( x i ) − ϕ ( x j ) ∥ 2 = ∥ ϕ ( x i ) ∥ 2 + ∥ ϕ ( x j ) ∥ 2 − 2 ϕ ( x i ) T ϕ ( x j ) = K ( x i , x i ) + K ( x j , x j ) − 2 K ( x i , x j ) \begin{aligned} \left\|\phi\left(\mathbf{x}_{i}\right)-\phi\left(\mathbf{x}_{j}\right)\right\|^{2} &=\left\|\phi\left(\mathbf{x}_{i}\right)\right\|^{2}+\left\|\phi\left(\mathbf{x}_{j}\right)\right\|^{2}-2 \phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right) \\ &=K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)+K\left(\mathbf{x}_{j}, \mathbf{x}_{j}\right)-2 K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \end{aligned} ϕ(xi)ϕ(xj)2=ϕ(xi)2+ϕ(xj)22ϕ(xi)Tϕ(xj)=K(xi,xi)+K(xj,xj)2K(xi,xj)

2 K ( x , y ) = ∥ ϕ ( x ) ∥ 2 + ∥ ϕ ( y ) ∥ 2 − ∥ ϕ ( x ) − ϕ ( y ) ∥ 2 2 K\left(\mathbf{x}, \mathbf{y}\right)=\left\|\phi\left(\mathbf{x}\right)\right\|^{2}+\left\|\phi\left(\mathbf{y}\right)\right\|^{2}-\left\|\phi\left(\mathbf{x}\right)-\phi\left(\mathbf{y}\right)\right\|^{2} 2K(x,y)=ϕ(x)2+ϕ(y)2ϕ(x)ϕ(y)2

代表 ϕ ( x ) \phi\left(\mathbf{x}\right) ϕ(x) ϕ ( y ) \phi\left(\mathbf{y}\right) ϕ(y) 的相似度

  • 平均值: μ ϕ = 1 n ∑ i = 1 n ϕ ( x i ) \boldsymbol{\mu}_{\phi}=\frac{1}{n} \sum\limits_{i=1}^{n} \phi\left(\mathbf{x}_{i}\right) μϕ=n1i=1nϕ(xi)
    ∥ μ ϕ ∥ 2 = μ ϕ T μ ϕ = ( 1 n ∑ i = 1 n ϕ ( x i ) ) T ( 1 n ∑ j = 1 n ϕ ( x j ) ) = 1 n 2 ∑ i = 1 n ∑ j = 1 n ϕ ( x i ) T ϕ ( x j ) = 1 n 2 ∑ i = 1 n ∑ j = 1 n K ( x i , x j ) \begin{aligned} \left\|\boldsymbol{\mu}_{\phi}\right\|^{2} &=\boldsymbol{\mu}_{\phi}^{T} \boldsymbol{\mu}_{\phi} \\ &=\left(\frac{1}{n} \sum\limits_{i=1}^{n} \phi\left(\mathbf{x}_{i}\right)\right)^{T}\left(\frac{1}{n} \sum\limits_{j=1}^{n} \phi\left(\mathbf{x}_{j}\right)\right) \\ &=\frac{1}{n^{2}} \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} \phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right) \\ &=\frac{1}{n^{2}} \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \end{aligned} μϕ 2=μϕTμϕ=(n1i=1nϕ(xi))T(n1j=1nϕ(xj))=n21i=1nj=1nϕ(xi)Tϕ(xj)=n21i=1nj=1nK(xi,xj)

  • 总方差: σ ϕ 2 = 1 n ∑ i = 1 n ∥ ϕ ( x i ) − μ ϕ ∥ 2 \sigma_{\phi}^{2}=\frac{1}{n} \sum\limits_{i=1}^{n}\left\|\phi\left(\mathbf{x}_{i}\right)-\boldsymbol{\mu}_{\phi}\right\|^{2} σϕ2=n1i=1n ϕ(xi)μϕ 2 ∀ x i \forall \mathbf{x}_{i} xi
    ∥ ϕ ( x i ) − μ ϕ ∥ 2 = ∥ ϕ ( x i ) ∥ 2 − 2 ϕ ( x i ) T μ ϕ + ∥ μ ϕ ∥ 2 = K ( x i , x i ) − 2 n ∑ j = 1 n K ( x i , x j ) + 1 n 2 ∑ s = 1 n ∑ t = 1 n K ( x s , x t ) \begin{aligned} \left\|\phi\left(\mathbf{x}_{i}\right)-\boldsymbol{\mu}_{\phi}\right\|^{2} &=\left\|\phi\left(\mathbf{x}_{i}\right)\right\|^{2}-2 \phi\left(\mathbf{x}_{i}\right)^{T} \boldsymbol{\mu}_{\phi}+\left\|\boldsymbol{\mu}_{\phi}\right\|^{2} \\ &=K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)-\frac{2}{n} \sum_{j=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)+\frac{1}{n^{2}} \sum_{s=1}^{n} \sum_{t=1}^{n} K\left(\mathbf{x}_{s}, \mathbf{x}_{t}\right) \end{aligned} ϕ(xi)μϕ 2=ϕ(xi)22ϕ(xi)Tμϕ+ μϕ 2=K(xi,xi)n2j=1nK(xi,xj)+n21s=1nt=1nK(xs,xt)

    σ ϕ 2 = 1 n ∑ i = 1 n ∥ ϕ ( x i ) − μ ϕ ∥ 2 = 1 n ∑ i = 1 n ( K ( x i , x i ) − 2 n ∑ j = 1 n K ( x i , x j ) + 1 n 2 ∑ s = 1 n ∑ t = 1 n K ( x s , x t ) ) = 1 n ∑ i = 1 n K ( x i , x i ) − 2 n 2 ∑ i = 1 n ∑ j = 1 n K ( x i , x j ) + 1 n 2 ∑ s = 1 n ∑ t = 1 n K ( x s , x t ) = 1 n ∑ i = 1 n K ( x i , x i ) − 1 n 2 ∑ i = 1 n ∑ j = 1 n K ( x i , x j ) \begin{aligned} \sigma_{\phi}^{2} &=\frac{1}{n} \sum\limits_{i=1}^{n}\left\|\phi\left(\mathbf{x}_{i}\right)-\boldsymbol{\mu}_{\phi}\right\|^{2}\\ &=\frac{1}{n} \sum\limits_{i=1}^{n}\left(K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)-\frac{2}{n} \sum\limits_{j=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)+\frac{1}{n^{2}} \sum\limits_{s=1}^{n} \sum\limits_{t=1}^{n} K\left(\mathbf{x}_{s}, \mathbf{x}_{t}\right)\right)\\ &=\frac{1}{n} \sum\limits_{i=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)-\frac{2}{n^{2}} \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)+\frac{1}{n^{2}} \sum\limits_{s=1}^{n} \sum\limits_{t=1}^{n} K\left(\mathbf{x}_{s}, \mathbf{x}_{t}\right)\\ &=\frac{1}{n} \sum\limits_{i=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)-\frac{1}{n^{2}} \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \end{aligned} σϕ2=n1i=1n ϕ(xi)μϕ 2=n1i=1n(K(xi,xi)n2j=1nK(xi,xj)+n21s=1nt=1nK(xs,xt))=n1i=1nK(xi,xi)n22i=1nj=1nK(xi,xj)+n21s=1nt=1nK(xs,xt)=n1i=1nK(xi,xi)n21i=1nj=1nK(xi,xj)

    1 n ∑ i = 1 n K ( x i , x i ) \frac{1}{n} \sum\limits_{i=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right) n1i=1nK(xi,xi) K \mathbf{K} K 对角线平均值, 1 n 2 ∑ i = 1 n ∑ j = 1 n K ( x i , x j ) \frac{1}{n^{2}} \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) n21i=1nj=1nK(xi,xj) K \mathbf{K} K 所有元的平均值。

  • 中心化核矩阵:令 ϕ ^ ( x i ) = ϕ ( x i ) − μ ϕ \hat{\phi}\left(\mathbf{x}_{i}\right)=\phi\left(\mathbf{x}_{i}\right)-\boldsymbol{\mu}_{\phi} ϕ^(xi)=ϕ(xi)μϕ

    中心核函数 K ^ ( x i , x j ) = ϕ ^ ( x i ) T ϕ ^ ( x j ) \hat{K}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) =\hat{\phi}\left(\mathbf{x}_{i}\right)^{T} \hat{\phi}\left(\mathbf{x}_{j}\right) K^(xi,xj)=ϕ^(xi)Tϕ^(xj)
    K ^ ( x i , x j ) = ( ϕ ( x i ) − μ ϕ ) T ( ϕ ( x j ) − μ ϕ ) = ϕ ( x i ) T ϕ ( x j ) − ϕ ( x i ) T μ ϕ − ϕ ( x j ) T μ ϕ + μ ϕ T μ ϕ = K ( x i , x j ) − 1 n ∑ k = 1 n ϕ ( x i ) T ϕ ( x k ) − 1 n ∑ k = 1 n ϕ ( x j ) T ϕ ( x k ) + ∥ μ ϕ ∥ 2 = K ( x i , x j ) − 1 n ∑ k = 1 n K ( x i , x k ) − 1 n ∑ k = 1 n K ( x j , x k ) + 1 n 2 ∑ s = 1 n ∑ t = 1 n K ( x s , x t ) \begin{aligned} \hat{K}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) &=\left(\phi\left(\mathbf{x}_{i}\right)-\boldsymbol{\mu}_{\phi}\right)^{T}\left(\phi\left(\mathbf{x}_{j}\right)-\boldsymbol{\mu}_{\phi}\right) \\ &=\phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right)-\phi\left(\mathbf{x}_{i}\right)^{T} \boldsymbol{\mu}_{\phi}-\phi\left(\mathbf{x}_{j}\right)^{T} \boldsymbol{\mu}_{\phi}+\boldsymbol{\mu}_{\phi}^{T} \boldsymbol{\mu}_{\phi}\\ &=K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)-\frac{1}{n} \sum_{k=1}^{n} \phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{k}\right)-\frac{1}{n} \sum_{k=1}^{n} \phi\left(\mathbf{x}_{j}\right)^{T} \phi\left(\mathbf{x}_{k}\right)+\left\|\boldsymbol{\mu}_{\phi}\right\|^{2} \\ &=K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)-\frac{1}{n} \sum_{k=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{k}\right)-\frac{1}{n} \sum_{k=1}^{n} K\left(\mathbf{x}_{j}, \mathbf{x}_{k}\right)+\frac{1}{n^{2}} \sum_{s=1}^{n} \sum_{t=1}^{n} K\left(\mathbf{x}_{s}, \mathbf{x}_{t}\right) \end{aligned} K^(xi,xj)=(ϕ(xi)μϕ)T(ϕ(xj)μϕ)=ϕ(xi)Tϕ(xj)ϕ(xi)Tμϕϕ(xj)Tμϕ+μϕTμϕ=K(xi,xj)n1k=1nϕ(xi)Tϕ(xk)n1k=1nϕ(xj)Tϕ(xk)+ μϕ 2=K(xi,xj)n1k=1nK(xi,xk)n1k=1nK(xj,xk)+n21s=1nt=1nK(xs,xt)

    K ^ = K − 1 n 1 n × n K − 1 n K 1 n × n + 1 n 2 1 n × n K 1 n × n = ( I − 1 n 1 n × n ) K ( I − 1 n 1 n × n ) \begin{aligned} \hat{\mathbf{K}} &=\mathbf{K}-\frac{1}{n} \mathbf{1}_{n \times n} \mathbf{K}-\frac{1}{n} \mathbf{K} \mathbf{1}_{n \times n}+\frac{1}{n^{2}} \mathbf{1}_{n \times n} \mathbf{K} \mathbf{1}_{n \times n} \\ &=\left(\mathbf{I}-\frac{1}{n} \mathbf{1}_{n \times n}\right) \mathbf{K}\left(\mathbf{I}-\frac{1}{n} \mathbf{1}_{n \times n}\right) \end{aligned} K^=Kn11n×nKn1K1n×n+n211n×nK1n×n=(In11n×n)K(In11n×n)
    注意 1 n × n \mathbf{1}_{n \times n} 1n×n 为全1矩阵,左乘之后每个元素为之前所在列的列和,右乘之和每个元素为之前所在行的行和,左右都乘之后每个元素即为原来所以元素之后。

  • 归一化核矩阵

    K n ( x i , x j ) = ϕ ( x i ) T ϕ ( x j ) ∥ ϕ ( x i ) ∥ ⋅ ∥ ϕ ( x j ) ∥ = K ( x i , x j ) K ( x i , x i ) ⋅ K ( x j , x j ) \mathbf{K}_{n}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)=\frac{\phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right)}{\left\|\phi\left(\mathbf{x}_{i}\right)\right\| \cdot\left\|\phi\left(\mathbf{x}_{j}\right)\right\|}=\frac{K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)}{\sqrt{K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right) \cdot K\left(\mathbf{x}_{j}, \mathbf{x}_{j}\right)}} Kn(xi,xj)=ϕ(xi)ϕ(xj)ϕ(xi)Tϕ(xj)=K(xi,xi)K(xj,xj) K(xi,xj)

    W = diag ⁡ ( K ) = ( K ( x 1 , x 1 ) 0 ⋯ 0 0 K ( x 2 , x 2 ) ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ K ( x n , x n ) ) \mathbf{W}=\operatorname{diag}(\mathbf{K})=\left(\begin{array}{cccc} K\left(\mathbf{x}_{1}, \mathbf{x}_{1}\right) & 0 & \cdots & 0 \\ 0 & K\left(\mathbf{x}_{2}, \mathbf{x}_{2}\right) & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & K\left(\mathbf{x}_{n}, \mathbf{x}_{n}\right) \end{array}\right) W=diag(K)= K(x1,x1)000K(x2,x2)000K(xn,xn) ,则 W − 1 / 2 ( x i , x i ) = 1 K ( x i , x i ) \mathbf{W}^{-1 / 2}\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)=\frac{1}{\sqrt{K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)}} W1/2(xi,xi)=K(xi,xi) 1

    K n = W − 1 / 2 ⋅ K ⋅ W − 1 / 2 \mathbf{K}_{n}=\mathbf{W}^{-1 / 2} \cdot \mathbf{K} \cdot \mathbf{W}^{-1 / 2} Kn=W1/2KW1/2

    矩阵左乘右乘对角阵的性质。

5.4 复杂对象的核

5.4.1 字串的谱核

考虑字符集 Σ \Sigma Σ (有限),定义 l l l-谱特征映射:
ϕ l : Σ ∗ → R ∣ Σ ∣ l , ∀ x ∈ Σ ∗ , ϕ l ( x ) = ( ⋯   , # ( α ) , ⋯   ) T \phi_l: \Sigma^{*} \rightarrow \mathbb{R}^{|\Sigma|^l},\forall \mathbf{x} \in \Sigma^{*},\phi_l(\mathbf{x})=\left ( \cdots,\#(\alpha),\cdots \right )^T ϕl:ΣR∣Σl,xΣ,ϕl(x)=(,#(α),)T
其中 # ( α ) \#(\alpha) #(α) 代表长度为 l l l 的子字串在 x \mathbf{x} x 中出现的次数。

l l l-谱核函数: K l : Σ ∗ × Σ ∗ → R \mathbf{K}_l:\Sigma^{*} \times \Sigma^{*} \to \mathbb{R} Kl:Σ×ΣR K l ( x , y ) = ϕ l ( x ) T ϕ l ( y ) \mathbf{K}_l (\mathbf{x},\mathbf{y})=\phi_l(\mathbf{x})^{T} \phi_l(\mathbf{y}) Kl(x,y)=ϕl(x)Tϕl(y)

谱核函数:计算 l = 0 l=0 l=0 l = ∞ l=\infty l=

5.4.2 图顶点的扩散核

  • 图:图 G = ( V , E ) G=(V,E) G=(V,E) 是指一个集合对,其中 V = { v 1 , ⋯   , v n } V=\{v_1,\cdots,v_n\} V={v1,,vn} 为顶点集, E = { ( v i , v j ) } E=\{(v_i,v_j)\} E={(vi,vj)} 为边集。现只考虑无向简单(没有自己到自己的边)图。

  • 邻接矩阵:图的邻接矩阵 A ( G ) : = [ A i j ] n × n A(G):=[A_{ij}]_{n\times n} A(G):=[Aij]n×n,其中 A i j = { 1 , ( v i , v j ) ∈ E 0 , ( v i , v j ) ∉ E A_{ij}=\left\{\begin{matrix} 1, (v_i,v_j)\in E\\ 0, (v_i,v_j) \notin E \end{matrix}\right. Aij={1,(vi,vj)E0,(vi,vj)/E

  • 度矩阵: Δ ( G ) : = d i a g ( d 1 , ⋯   , d n ) \Delta (G):=diag(d_1,\cdots,d_n) Δ(G):=diag(d1,,dn),其中 d i d_i di 代表顶点 v i v_i vi 的度,即与 v i v_i vi 相连的边的数目。

  • 拉普拉斯矩阵: L ( G ) : = A ( G ) − Δ ( G ) L(G):=A(G)-\Delta(G) L(G):=A(G)Δ(G)

    负拉普拉斯矩阵: L ( G ) : = − L ( G ) L(G):=-L(G) L(G):=L(G)

    它们是实对称

常用图的对称相似性矩阵 S \mathbf{S} S 是指 A ( G ) A(G) A(G) L ( G ) L(G) L(G) − L ( G ) -L(G) L(G)

问题:如何定义图顶点的核函数?( S \mathbf{S} S 并不一定是半正定)

- 幂核函数

S t \mathbf{S}^t St 作为核矩阵, S \mathbf{S} S 是对称的, t t t 为正整数

考虑 S 2 \mathbf{S}^2 S2 S 2 ( x i , x j ) = ∑ k = 1 n S i k S k j \mathbf{S}^2(x_i,x_j)=\sum\limits_{k=1}^{n}S_{ik}S_{kj} S2(xi,xj)=k=1nSikSkj

此公式说明 S 2 \mathbf{S}^2 S2 ( S l \mathbf{S}^l Sl) 的几何意义:顶点间长度为 2 2 2 ( l ) (l) (l) 的路径,描述顶点的相似性。

考虑 S l \mathbf{S}^l Sl 的特征值:设 S \mathbf{S} S 的特征值为 λ 1 , ⋯   , λ n ∈ R \lambda_1,\cdots,\lambda_n \in \mathbb{R} λ1,,λnR,则
S = U ( λ 1 0 ⋯ 0 0 λ 2 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ λ n ) U T = ∑ i = 1 n λ i u i u i T \mathbf{S}=\mathbf{U}\left(\begin{array}{cccc} \lambda_{1} & 0 & \cdots & 0 \\ 0 & \lambda_{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_{n} \end{array}\right)\mathbf{U}^{T}=\sum_{i=1}^{n} \lambda_{i}\mathbf{u}_{i} \mathbf{u}_{i}^{T} S=U λ1000λ2000λn UT=i=1nλiuiuiT
其中 U \mathbf{U} U 是以相应特征向量为列的正交矩阵, U = ( ∣ ∣ ∣ u 1 u 2 ⋯ u n ∣ ∣ ∣ ) \mathbf{U}=\left(\begin{array}{cccc} \mid & \mid & & \mid \\ \mathbf{u}_{1} & \mathbf{u}_{2} & \cdots & \mathbf{u}_{n} \\ \mid & \mid & & \mid \end{array}\right) U= u1u2un
S l = U ( λ 1 l 0 ⋯ 0 0 λ 2 l ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ λ n l ) U T \mathbf{S}^l=\mathbf{U}\left(\begin{array}{cccc} \lambda_{1}^l & 0 & \cdots & 0 \\ 0 & \lambda_{2}^l & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_{n}^l \end{array}\right)\mathbf{U}^{T} Sl=U λ1l000λ2l000λnl UT
λ 1 l , ⋯   , λ n l \lambda_1^l,\cdots,\lambda_n^l λ1l,,λnl S l \mathbf{S}^l Sl 的特征值。

故若 l l l 是偶数, S l \mathbf{S}^l Sl 半正定。

- 指数扩散核函数

K : = e β S \mathbf{K}:=e^{\beta \mathbf{S}} K:=eβS 为核矩阵,其中 β > 0 \beta >0 β>0 为阻尼系数。(泰勒展开: e β x = ∑ l = 0 ∞ 1 l ! β l x l e^{\beta x}=\sum\limits_{l=0}^{\infin}\frac{1}{l!}\beta^{l}x^l eβx=l=0l!1βlxl
e β S = ∑ l = 0 ∞ 1 l ! β ′ S l = I + β S + 1 2 ! β 2 S 2 + 1 3 ! β 3 S 3 + ⋯ = ( ∑ i = 1 n u i u i T ) + ( ∑ i = 1 n u i β λ i u i T ) + ( ∑ i = 1 n u i 1 2 ! β 2 λ i 2 u i T ) + ⋯ = ∑ i = 1 n u i ( 1 + β λ i + 1 2 ! β 2 λ i 2 + ⋯   ) u i T = ∑ i = 1 n u i e β λ i u i T = U ( e β λ 1 0 ⋯ 0 0 e β λ 2 ⋯ 0 ⋮ ⋮ ⋱ 0 0 0 ⋯ e β λ n ) U T \begin{aligned} e^{\beta \mathbf{S}} &=\sum_{l=0}^{\infty} \frac{1}{l !} \beta^{\prime} \mathbf{S}^{l} \\ &=\mathbf{I}+\beta \mathbf{S}+\frac{1}{2 !} \beta^{2} \mathbf{S}^{2}+\frac{1}{3 !} \beta^{3} \mathbf{S}^{3}+\cdots\\ &=\left(\sum_{i=1}^{n} \mathbf{u}_{i} \mathbf{u}_{i}^{T}\right)+\left(\sum_{i=1}^{n} \mathbf{u}_{i} \beta \lambda_{i} \mathbf{u}_{i}^{T}\right)+\left(\sum_{i=1}^{n} \mathbf{u}_{i} \frac{1}{2 !} \beta^{2} \lambda_{i}^{2} \mathbf{u}_{i}^{T}\right)+\cdots \\ &=\sum_{i=1}^{n} \mathbf{u}_{i}\left(1+\beta \lambda_{i}+\frac{1}{2 !} \beta^{2} \lambda_{i}^{2}+\cdots\right) \mathbf{u}_{i}^{T} \\ &=\sum_{i=1}^{n} \mathbf{u}_{i} e ^{\beta \lambda_{i}} \mathbf{u}_{i}^{T} \\ &=\mathbf{U}\left(\begin{array}{cccc} e ^{\beta \lambda_{1}} & 0 & \cdots & 0 \\ 0 & e ^{\beta \lambda_{2}} & \cdots & 0 \\ \vdots & \vdots & \ddots & 0 \\ 0 & 0 & \cdots & e ^{\beta \lambda_{n}} \end{array}\right) \mathbf{U}^{T} \end{aligned} eβS=l=0l!1βSl=I+βS+2!1β2S2+3!1β3S3+=(i=1nuiuiT)+(i=1nuiβλiuiT)+(i=1nui2!1β2λi2uiT)+=i=1nui(1+βλi+2!1β2λi2+)uiT=i=1nuieβλiuiT=U eβλ1000eβλ20000eβλn UT
K \mathbf{K} K 的特征值 e β λ 1 , ⋯   , e β λ n e ^{\beta \lambda_{1}},\cdots,e ^{\beta \lambda_{n}} eβλ1,,eβλn 完全非负, K \mathbf{K} K 为半正定。

- 纽因曼扩散核函数

K = ∑ l = 0 ∞ β l S l \mathbf{K}=\sum\limits_{l=0}^{\infty} \beta^l \mathbf{S}^l K=l=0βlSl 为核矩阵,注意到
K = I + β S + β 2 S 2 + β 3 S 3 + ⋯ = I + β S ( I + β S + β 2 S 2 + ⋯   ) = I + β S K \begin{aligned} \mathbf{K} &=\mathbf{I}+\beta \mathbf{S}+\beta^{2} \mathbf{S}^{2}+\beta^{3} \mathbf{S}^{3}+\cdots \\ &=\mathbf{I}+\beta \mathbf{S}\left(\mathbf{I}+\beta \mathbf{S}+\beta^{2} \mathbf{S}^{2}+\cdots\right) \\ &=\mathbf{I}+\beta \mathbf{S} \mathbf{K} \end{aligned} K=I+βS+β2S2+β3S3+=I+βS(I+βS+β2S2+)=I+βSK

K − β S K = I ( I − β S ) K = I K = ( I − β S ) − 1 \begin{aligned} \mathbf{K}-\beta \mathbf{S} \mathbf{K} &=\mathbf{I} \\ (\mathbf{I}-\beta \mathbf{S}) \mathbf{K} &=\mathbf{I} \\ \mathbf{K} &=(\mathbf{I}-\beta \mathbf{S})^{-1} \end{aligned} KβSK(IβS)KK=I=I=(IβS)1
I − β S \mathbf{I}-\beta \mathbf{S} IβS 可逆的前提下
K = ( U U T − U ( β Λ ) U T ) − 1 = ( U ( I − β Λ ) U T ) − 1 = U ( I − β Λ ) − 1 U T = U ( 1 1 − β λ 1 0 ⋯ 0 0 1 1 − β λ 2 ⋯ 0 ⋮ ⋮ ⋱ 0 0 0 ⋯ 1 1 − β λ n ) U T \begin{aligned} \mathbf{K} &=\left(\mathbf{U} \mathbf{U}^{T}-\mathbf{U}(\beta \mathbf{\Lambda}) \mathbf{U}^{T}\right)^{-1} \\ &=\left(\mathbf{U}(\mathbf{I}-\beta \mathbf{\Lambda}) \mathbf{U}^{T}\right)^{-1} \\ &=\mathbf{U}(\mathbf{I}-\beta \mathbf{\Lambda})^{-1} \mathbf{U}^{T}\\ &=\mathbf{U}\left(\begin{array}{cccc} \frac{1}{1-\beta\lambda_1} & 0 & \cdots & 0 \\ 0 & \frac{1}{1-\beta\lambda_2} & \cdots & 0 \\ \vdots & \vdots & \ddots & 0 \\ 0 & 0 & \cdots & \frac{1}{1-\beta\lambda_n} \end{array}\right) \mathbf{U}^{T} \end{aligned} K=(UUTU(βΛ)UT)1=(U(IβΛ)UT)1=U(IβΛ)1UT=U 1βλ110001βλ2100001βλn1 UT
要想其半正定,故有:
( 1 − β λ i ) − 1 ≥ 0 1 − β λ i ≥ 0 β λ i ≤ 1 \begin{aligned} \left(1-\beta \lambda_{i}\right)^{-1} & \geq 0 \\ 1-\beta \lambda_{i} & \geq 0 \\ \beta \lambda_{i} & \leq 1 \end{aligned} (1βλi)11βλiβλi001


11-04 11:59