数据挖掘与分析课程笔记
- 参考教材:Data Mining and Analysis : MOHAMMED J.ZAKI, WAGNER MEIRA JR.
文章目录
- 数据挖掘与分析课程笔记(目录)
- 数据挖掘与分析课程笔记(Chapter 1)
- 数据挖掘与分析课程笔记(Chapter 2)
- 数据挖掘与分析课程笔记(Chapter 5)
- 数据挖掘与分析课程笔记(Chapter 7)
- 数据挖掘与分析课程笔记(Chapter 14)
- 数据挖掘与分析课程笔记(Chapter 15)
- 数据挖掘与分析课程笔记(Chapter 20)
- 数据挖掘与分析课程笔记(Chapter 21)
笔记目录
Chapter 5 Kernel Method:核方法
Example 5.1 略, ϕ ( 核映射 ) : Σ ∗ ( 输入空间 ) → R 4 ( 特征空间 ) \phi(核映射):\Sigma^*(输入空间)\to \mathbb{R}^4(特征空间) ϕ(核映射):Σ∗(输入空间)→R4(特征空间)
Def.1. 假设核映射 ϕ : I → F \phi:\mathcal{I}\to \mathcal{F} ϕ:I→F, ϕ \phi ϕ 的核函数是指 K : I × I → R K:\mathcal{I}\times\mathcal{I}\to \mathbb{R} K:I×I→R 使得 ∀ ( x i , x j ) ∈ I × I , K ( x i , x j ) = ϕ T ( x i ) ϕ ( x j ) \forall (\mathbf{x}_i,\mathbf{x}_j)\in \mathcal{I}\times\mathcal{I},K(\mathbf{x}_i,\mathbf{x}_j)=\phi^T(\mathbf{x}_i)\phi(\mathbf{x}_j) ∀(xi,xj)∈I×I,K(xi,xj)=ϕT(xi)ϕ(xj)
Example 5.2 设 ϕ : R 2 → R 3 \phi:\mathbb{R}^2\to \mathbb{R}^3 ϕ:R2→R3 使得 ∀ a = ( a 1 , a 2 ) , ϕ ( a ) = ( a 1 2 , a 2 2 , 2 a 1 a 2 ) T \forall \mathbf{a}=(a_1,a_2),\phi(\mathbf{a})=(a_1^2,a_2^2,\sqrt2a_1a_2)^T ∀a=(a1,a2),ϕ(a)=(a12,a22,2 a1a2)T
注意到 K ( a , b ) = ϕ ( a ) T ϕ ( b ) = a 1 2 b 1 2 + a 2 2 b 2 2 + 2 a 1 2 a 2 2 b 1 2 b 2 2 K(\mathbf{a},\mathbf{b})=\phi(\mathbf{a})^T\phi(\mathbf{b})=a_1^2b_1^2+a_2^2b_2^2+2a_1^2a_2^2b_1^2b_2^2 K(a,b)=ϕ(a)Tϕ(b)=a12b12+a22b22+2a12a22b12b22, K : R 2 × R 2 → R K:\mathbb{R}^2\times\mathbb{R}^2\to \mathbb{R} K:R2×R2→R
Remark:
- 分析复杂数据
- 分析非线性特征(知乎搜核函数有什么作用)
Goal:在未知 ϕ \phi ϕ 的情况下,通过分析 K K K 来分析特征空间 F \mathcal{F} F 结果。
5.1 核矩阵
设 D = { x 1 , x 2 , … , x n } ⊂ I \mathbf{D}=\left\{\mathbf{x}_{1}, \mathbf{x}_{2}, \ldots, \mathbf{x}_{n}\right\} \subset \mathcal{I} D={x1,x2,…,xn}⊂I,其核矩阵定义为: K = [ K ( x i , x j ) ] n × n \mathbf{K}=[K(\mathbf{x}_{i},\mathbf{x}_{j})]_{n\times n} K=[K(xi,xj)]n×n
Prop. 核矩阵 K \mathbf{K} K 是对称的且半正定的
Proof. K ( x i , x j ) = ϕ T ( x i ) ϕ ( x j ) = ϕ T ( x j ) ϕ ( x i ) = K ( x j , x i ) K(\mathbf{x}_{i},\mathbf{x}_{j})=\phi^T(\mathbf{x}_i)\phi(\mathbf{x}_j)=\phi^T(\mathbf{x}_j)\phi(\mathbf{x}_i)=K(\mathbf{x}_{j},\mathbf{x}_{i}) K(xi,xj)=ϕT(xi)ϕ(xj)=ϕT(xj)ϕ(xi)=K(xj,xi),故对称。
对于 ∀ a T ∈ R n \forall \mathbf{a}^{T}\in \mathbb{R}^n ∀aT∈Rn,
a T K a = ∑ i = 1 n ∑ j = 1 n a i a j K ( x i , x j ) = ∑ i = 1 n ∑ j = 1 n a i a j ϕ ( x i ) T ϕ ( x j ) = ( ∑ i = 1 n a i ϕ ( x i ) ) T ( ∑ j = 1 n a j ϕ ( x j ) ) = ∥ ∑ i = 1 n a i ϕ ( x i ) ∥ 2 ≥ 0 \begin{aligned} \mathbf{a}^{T} \mathbf{K a} &=\sum_{i=1}^{n} \sum_{j=1}^{n} a_{i} a_{j} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \\ &=\sum_{i=1}^{n} \sum_{j=1}^{n} a_{i} a_{j} \phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right) \\ &=\left(\sum_{i=1}^{n} a_{i} \phi\left(\mathbf{x}_{i}\right)\right)^{T}\left(\sum_{j=1}^{n} a_{j} \phi\left(\mathbf{x}_{j}\right)\right) \\ &=\left\|\sum_{i=1}^{n} a_{i} \phi\left(\mathbf{x}_{i}\right)\right\|^2 \geq 0 \end{aligned} aTKa=i=1∑nj=1∑naiajK(xi,xj)=i=1∑nj=1∑naiajϕ(xi)Tϕ(xj)=(i=1∑naiϕ(xi))T(j=1∑najϕ(xj))=∥ ∥i=1∑naiϕ(xi)∥ ∥2≥0
5.1.1 核映射的重构
”经验核映射“
已知 D = { x i } i = 1 n ⊂ I \mathbf{D}=\left\{\mathbf{x}_{i}\right\}_{i=1}^{n} \subset \mathcal{I} D={xi}i=1n⊂I 与核矩阵 K \mathbf{K} K
目标:寻找 ϕ : I → F ⊂ R n \phi:\mathcal{I} \to \mathcal{F} \subset \mathbb{R}^n ϕ:I→F⊂Rn
首先尝试: ∀ x ∈ I , ϕ ( x ) = ( K ( x 1 , x ) , K ( x 2 , x ) , … , K ( x n , x ) ) T ∈ R n \forall \mathbf{x} \in \mathcal{I},\phi(\mathbf{x})=\left(K\left(\mathbf{x}_{1}, \mathbf{x}\right), K\left(\mathbf{x}_{2}, \mathbf{x}\right), \ldots, K\left(\mathbf{x}_{n}, \mathbf{x}\right)\right)^{T} \in \mathbb{R}^{n} ∀x∈I,ϕ(x)=(K(x1,x),K(x2,x),…,K(xn,x))T∈Rn
检查: ϕ T ( x i ) ϕ ( x j ) ? = K ( x i , x j ) \phi^T(\mathbf{x}_i)\phi(\mathbf{x}_j)?=K(\mathbf{x}_{i},\mathbf{x}_{j}) ϕT(xi)ϕ(xj)?=K(xi,xj)
左边 = ϕ ( x i ) T ϕ ( x j ) = ∑ k = 1 n K ( x k , x i ) K ( x k , x j ) = K i T K j =\phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right)=\sum\limits_{k=1}^{n} K\left(\mathbf{x}_{k}, \mathbf{x}_{i}\right) K\left(\mathbf{x}_{k}, \mathbf{x}_{j}\right)=\mathbf{K}_{i}^{T} \mathbf{K}_{j} =ϕ(xi)Tϕ(xj)=k=1∑nK(xk,xi)K(xk,xj)=KiTKj, K i \mathbf{K}_{i} Ki 代表第 i i i 行或列要求太高。
考虑改进:寻找矩阵 A \mathbf{A} A 使得, K i T A K j = K ( x i , x j ) \mathbf{K}_{i}^{T} \mathbf{A} \mathbf{K}_{j}=\mathbf{K}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) KiTAKj=K(xi,xj),即 K T A K = K \mathbf{K}^{T} \mathbf{A} \mathbf{K}=\mathbf{K} KTAK=K
故只需取 A = K − 1 \mathbf{A}=\mathbf{K}^{-1} A=K−1 即可( K \mathbf{K} K 可逆)
若 K \mathbf{K} K 正定, K − 1 \mathbf{K}^{-1} K−1 也正定,即存在一个实矩阵 B \mathbf{B} B 满足 K − 1 = B T B \mathbf{K}^{-1}=\mathbf{B}^{T}\mathbf{B} K−1=BTB
故经验核函数可定义为:
ϕ ( x ) = B ⋅ ( K ( x 1 , x ) , K ( x 2 , x ) , … , K ( x n , x ) ) T \phi(\mathbf{x})=\mathbf{B}\cdot\left(K\left(\mathbf{x}_{1}, \mathbf{x}\right), K\left(\mathbf{x}_{2}, \mathbf{x}\right), \ldots, K\left(\mathbf{x}_{n}, \mathbf{x}\right)\right)^{T} ϕ(x)=B⋅(K(x1,x),K(x2,x),…,K(xn,x))T
检查: ϕ T ( x i ) ϕ ( x j ) = ( B K i ) T ( B K j ) = K i T K − 1 K j = ( K T K − 1 K ) i , j = K ( x i , x j ) \phi^T(\mathbf{x}_i)\phi(\mathbf{x}_j)=(\mathbf{B}\mathbf{K}_i)^T(\mathbf{B}\mathbf{K}_j)=\mathbf{K}_i^T\mathbf{K}^{-1}\mathbf{K}_j=(\mathbf{K}^T\mathbf{K}^{-1}\mathbf{K})_{i,j}=K(\mathbf{x}_{i},\mathbf{x}_{j}) ϕT(xi)ϕ(xj)=(BKi)T(BKj)=KiTK−1Kj=(KTK−1K)i,j=K(xi,xj)
5.1.2 特定数据的海塞核映射
对于对称半正定矩阵 K n × n \mathbf{K}_{n\times n} Kn×n,存在分解
K = U ( λ 1 0 ⋯ 0 0 λ 2 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ λ n ) U T = U Λ U T \mathbf{K}=\mathbf{U}\left(\begin{array}{cccc} \lambda_{1} & 0 & \cdots & 0 \\ 0 & \lambda_{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_{n} \end{array}\right)\mathbf{U}^{T}=\mathbf{U}\boldsymbol{\Lambda}\mathbf{U}^{T} K=U⎝ ⎛λ10⋮00λ2⋮0⋯⋯⋱⋯00⋮λn⎠ ⎞UT=UΛUT
λ i \lambda_{i} λi 为特征值, U = ( ∣ ∣ ∣ u 1 u 2 ⋯ u n ∣ ∣ ∣ ) \mathbf{U}=\left(\begin{array}{cccc} \mid & \mid & & \mid \\ \mathbf{u}_{1} & \mathbf{u}_{2} & \cdots & \mathbf{u}_{n} \\ \mid & \mid & & \mid \end{array}\right) U=⎝ ⎛∣u1∣∣u2∣⋯∣un∣⎠ ⎞ 为单位正交矩阵, u i = ( u i 1 , u i 2 , … , u i n ) T ∈ R n \mathbf{u}_{i}=\left(u_{i 1}, u_{i 2}, \ldots, u_{i n}\right)^{T} \in \mathbb{R}^{n} ui=(ui1,ui2,…,uin)T∈Rn 为特征向量,即
K = λ 1 u 1 u 1 T + λ 2 u 2 u 2 T + ⋯ + λ n u n u n T K ( x i , x j ) = λ 1 u 1 i u 1 j + λ 2 u 2 i u 2 j ⋯ + λ n u n i u n j = ∑ k = 1 n λ k u k i u k j \mathbf{K}=\lambda_{1} \mathbf{u}_{1} \mathbf{u}_{1}^{T}+\lambda_{2} \mathbf{u}_{2} \mathbf{u}_{2}^{T}+\cdots+\lambda_{n} \mathbf{u}_{n} \mathbf{u}_{n}^{T}\\ \begin{aligned} \mathbf{K}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) &=\lambda_{1} u_{1 i} u_{1 j}+\lambda_{2} u_{2 i} u_{2 j} \cdots+\lambda_{n} u_{n i} u_{n j} \\ &=\sum_{k=1}^{n} \lambda_{k} u_{k i} u_{k j} \end{aligned} K=λ1u1u1T+λ2u2u2T+⋯+λnununTK(xi,xj)=λ1u1iu1j+λ2u2iu2j⋯+λnuniunj=k=1∑nλkukiukj
定义海塞映射:
∀ x i ∈ D , ϕ ( x i ) = ( λ 1 u 1 i , λ 2 u 2 i , … , λ n u n i ) T \forall \mathbf{x}_i \in \mathbf {D}, \phi\left(\mathbf{x}_{i}\right)=\left(\sqrt{\lambda_{1}} u_{1 i}, \sqrt{\lambda_{2}} u_{2 i}, \ldots, \sqrt{\lambda_{n}} u_{n i}\right)^{T} ∀xi∈D,ϕ(xi)=(λ1 u1i,λ2 u2i,…,λn uni)T
检查:
ϕ ( x i ) T ϕ ( x j ) = ( λ 1 u 1 i , … , λ n u n i ) ( λ 1 u 1 j , … , λ n u n j ) T = λ 1 u 1 i u 1 j + ⋯ + λ n u n i u n j = K ( x i , x j ) \begin{aligned} \phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right) &=\left(\sqrt{\lambda_{1}} u_{1 i}, \ldots, \sqrt{\lambda_{n}} u_{n i}\right)\left(\sqrt{\lambda_{1}} u_{1 j}, \ldots, \sqrt{\lambda_{n}} u_{n j}\right)^{T} \\ &=\lambda_{1} u_{1 i} u_{1 j}+\cdots+\lambda_{n} u_{n i} u_{n j}=K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \end{aligned} ϕ(xi)Tϕ(xj)=(λ1 u1i,…,λn uni)(λ1 u1j,…,λn unj)T=λ1u1iu1j+⋯+λnuniunj=K(xi,xj)
注意:海塞映射中仅对 D \mathbf{D} D 中的数 x i \mathbf{x}_i xi 有定义。
5.2 向量核函数
R d × R d → R \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R} Rd×Rd→R
典型向量核函数:多项式核
∀ x , y ∈ R d , K q ( x , y ) = ϕ ( x ) T ϕ ( y ) = ( x T y + c ) q \forall \mathbf{x},\mathbf{y} \in \mathbb {R}^d, K_{q}(\mathbf{x}, \mathbf{y})=\phi(\mathbf{x})^{T} \phi(\mathbf{y})=\left(\mathbf{x}^{T} \mathbf{y} + c \right)^{q} ∀x,y∈Rd,Kq(x,y)=ϕ(x)Tϕ(y)=(xTy+c)q,其中 c ≥ 0 c\ge 0 c≥0
若 c = 0 c=0 c=0,齐次,否则为非齐次。
问题:构造核映射 ϕ : R d → F \phi:\mathbb{R}^d \to \mathcal{F} ϕ:Rd→F,使得 K q ( x , y ) = ϕ ( x ) T ϕ ( y ) K_{q}(\mathbf{x}, \mathbf{y})=\phi(\mathbf{x})^{T} \phi(\mathbf{y}) Kq(x,y)=ϕ(x)Tϕ(y)
注: q = 1 , c = 0 , ϕ ( x ) = x q=1,c=0, \phi (\mathbf{x})=\mathbf{x} q=1,c=0,ϕ(x)=x
示例: q = 2 , d = 2 q=2,d=2 q=2,d=2
高斯核自读
5.3 特征空间中基本核运算
ϕ : I → F , K : I × I → R \phi:\mathcal{I} \to \mathcal{F}, K:\mathcal{I} \times \mathcal{I}\to \mathbb{R} ϕ:I→F,K:I×I→R
-
向量长度: ∥ ϕ ( x ) ∥ 2 = ϕ ( x ) T ϕ ( x ) = K ( x , x ) \|\phi(\mathbf{x})\|^{2}=\phi(\mathbf{x})^{T} \phi(\mathbf{x})=K(\mathbf{x}, \mathbf{x}) ∥ϕ(x)∥2=ϕ(x)Tϕ(x)=K(x,x)
-
距离:
∥ ϕ ( x i ) − ϕ ( x j ) ∥ 2 = ∥ ϕ ( x i ) ∥ 2 + ∥ ϕ ( x j ) ∥ 2 − 2 ϕ ( x i ) T ϕ ( x j ) = K ( x i , x i ) + K ( x j , x j ) − 2 K ( x i , x j ) \begin{aligned} \left\|\phi\left(\mathbf{x}_{i}\right)-\phi\left(\mathbf{x}_{j}\right)\right\|^{2} &=\left\|\phi\left(\mathbf{x}_{i}\right)\right\|^{2}+\left\|\phi\left(\mathbf{x}_{j}\right)\right\|^{2}-2 \phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right) \\ &=K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)+K\left(\mathbf{x}_{j}, \mathbf{x}_{j}\right)-2 K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \end{aligned} ∥ϕ(xi)−ϕ(xj)∥2=∥ϕ(xi)∥2+∥ϕ(xj)∥2−2ϕ(xi)Tϕ(xj)=K(xi,xi)+K(xj,xj)−2K(xi,xj)
2 K ( x , y ) = ∥ ϕ ( x ) ∥ 2 + ∥ ϕ ( y ) ∥ 2 − ∥ ϕ ( x ) − ϕ ( y ) ∥ 2 2 K\left(\mathbf{x}, \mathbf{y}\right)=\left\|\phi\left(\mathbf{x}\right)\right\|^{2}+\left\|\phi\left(\mathbf{y}\right)\right\|^{2}-\left\|\phi\left(\mathbf{x}\right)-\phi\left(\mathbf{y}\right)\right\|^{2} 2K(x,y)=∥ϕ(x)∥2+∥ϕ(y)∥2−∥ϕ(x)−ϕ(y)∥2
代表 ϕ ( x ) \phi\left(\mathbf{x}\right) ϕ(x) 与 ϕ ( y ) \phi\left(\mathbf{y}\right) ϕ(y) 的相似度
-
平均值: μ ϕ = 1 n ∑ i = 1 n ϕ ( x i ) \boldsymbol{\mu}_{\phi}=\frac{1}{n} \sum\limits_{i=1}^{n} \phi\left(\mathbf{x}_{i}\right) μϕ=n1i=1∑nϕ(xi)
∥ μ ϕ ∥ 2 = μ ϕ T μ ϕ = ( 1 n ∑ i = 1 n ϕ ( x i ) ) T ( 1 n ∑ j = 1 n ϕ ( x j ) ) = 1 n 2 ∑ i = 1 n ∑ j = 1 n ϕ ( x i ) T ϕ ( x j ) = 1 n 2 ∑ i = 1 n ∑ j = 1 n K ( x i , x j ) \begin{aligned} \left\|\boldsymbol{\mu}_{\phi}\right\|^{2} &=\boldsymbol{\mu}_{\phi}^{T} \boldsymbol{\mu}_{\phi} \\ &=\left(\frac{1}{n} \sum\limits_{i=1}^{n} \phi\left(\mathbf{x}_{i}\right)\right)^{T}\left(\frac{1}{n} \sum\limits_{j=1}^{n} \phi\left(\mathbf{x}_{j}\right)\right) \\ &=\frac{1}{n^{2}} \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} \phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right) \\ &=\frac{1}{n^{2}} \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \end{aligned} ∥ ∥μϕ∥ ∥2=μϕTμϕ=(n1i=1∑nϕ(xi))T(n1j=1∑nϕ(xj))=n21i=1∑nj=1∑nϕ(xi)Tϕ(xj)=n21i=1∑nj=1∑nK(xi,xj) -
总方差: σ ϕ 2 = 1 n ∑ i = 1 n ∥ ϕ ( x i ) − μ ϕ ∥ 2 \sigma_{\phi}^{2}=\frac{1}{n} \sum\limits_{i=1}^{n}\left\|\phi\left(\mathbf{x}_{i}\right)-\boldsymbol{\mu}_{\phi}\right\|^{2} σϕ2=n1i=1∑n∥ ∥ϕ(xi)−μϕ∥ ∥2, ∀ x i \forall \mathbf{x}_{i} ∀xi
∥ ϕ ( x i ) − μ ϕ ∥ 2 = ∥ ϕ ( x i ) ∥ 2 − 2 ϕ ( x i ) T μ ϕ + ∥ μ ϕ ∥ 2 = K ( x i , x i ) − 2 n ∑ j = 1 n K ( x i , x j ) + 1 n 2 ∑ s = 1 n ∑ t = 1 n K ( x s , x t ) \begin{aligned} \left\|\phi\left(\mathbf{x}_{i}\right)-\boldsymbol{\mu}_{\phi}\right\|^{2} &=\left\|\phi\left(\mathbf{x}_{i}\right)\right\|^{2}-2 \phi\left(\mathbf{x}_{i}\right)^{T} \boldsymbol{\mu}_{\phi}+\left\|\boldsymbol{\mu}_{\phi}\right\|^{2} \\ &=K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)-\frac{2}{n} \sum_{j=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)+\frac{1}{n^{2}} \sum_{s=1}^{n} \sum_{t=1}^{n} K\left(\mathbf{x}_{s}, \mathbf{x}_{t}\right) \end{aligned} ∥ ∥ϕ(xi)−μϕ∥ ∥2=∥ϕ(xi)∥2−2ϕ(xi)Tμϕ+∥ ∥μϕ∥ ∥2=K(xi,xi)−n2j=1∑nK(xi,xj)+n21s=1∑nt=1∑nK(xs,xt)σ ϕ 2 = 1 n ∑ i = 1 n ∥ ϕ ( x i ) − μ ϕ ∥ 2 = 1 n ∑ i = 1 n ( K ( x i , x i ) − 2 n ∑ j = 1 n K ( x i , x j ) + 1 n 2 ∑ s = 1 n ∑ t = 1 n K ( x s , x t ) ) = 1 n ∑ i = 1 n K ( x i , x i ) − 2 n 2 ∑ i = 1 n ∑ j = 1 n K ( x i , x j ) + 1 n 2 ∑ s = 1 n ∑ t = 1 n K ( x s , x t ) = 1 n ∑ i = 1 n K ( x i , x i ) − 1 n 2 ∑ i = 1 n ∑ j = 1 n K ( x i , x j ) \begin{aligned} \sigma_{\phi}^{2} &=\frac{1}{n} \sum\limits_{i=1}^{n}\left\|\phi\left(\mathbf{x}_{i}\right)-\boldsymbol{\mu}_{\phi}\right\|^{2}\\ &=\frac{1}{n} \sum\limits_{i=1}^{n}\left(K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)-\frac{2}{n} \sum\limits_{j=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)+\frac{1}{n^{2}} \sum\limits_{s=1}^{n} \sum\limits_{t=1}^{n} K\left(\mathbf{x}_{s}, \mathbf{x}_{t}\right)\right)\\ &=\frac{1}{n} \sum\limits_{i=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)-\frac{2}{n^{2}} \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)+\frac{1}{n^{2}} \sum\limits_{s=1}^{n} \sum\limits_{t=1}^{n} K\left(\mathbf{x}_{s}, \mathbf{x}_{t}\right)\\ &=\frac{1}{n} \sum\limits_{i=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)-\frac{1}{n^{2}} \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) \end{aligned} σϕ2=n1i=1∑n∥ ∥ϕ(xi)−μϕ∥ ∥2=n1i=1∑n(K(xi,xi)−n2j=1∑nK(xi,xj)+n21s=1∑nt=1∑nK(xs,xt))=n1i=1∑nK(xi,xi)−n22i=1∑nj=1∑nK(xi,xj)+n21s=1∑nt=1∑nK(xs,xt)=n1i=1∑nK(xi,xi)−n21i=1∑nj=1∑nK(xi,xj)
1 n ∑ i = 1 n K ( x i , x i ) \frac{1}{n} \sum\limits_{i=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right) n1i=1∑nK(xi,xi) 是 K \mathbf{K} K 对角线平均值, 1 n 2 ∑ i = 1 n ∑ j = 1 n K ( x i , x j ) \frac{1}{n^{2}} \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) n21i=1∑nj=1∑nK(xi,xj) 是 K \mathbf{K} K 所有元的平均值。
-
中心化核矩阵:令 ϕ ^ ( x i ) = ϕ ( x i ) − μ ϕ \hat{\phi}\left(\mathbf{x}_{i}\right)=\phi\left(\mathbf{x}_{i}\right)-\boldsymbol{\mu}_{\phi} ϕ^(xi)=ϕ(xi)−μϕ
中心核函数 K ^ ( x i , x j ) = ϕ ^ ( x i ) T ϕ ^ ( x j ) \hat{K}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) =\hat{\phi}\left(\mathbf{x}_{i}\right)^{T} \hat{\phi}\left(\mathbf{x}_{j}\right) K^(xi,xj)=ϕ^(xi)Tϕ^(xj)
K ^ ( x i , x j ) = ( ϕ ( x i ) − μ ϕ ) T ( ϕ ( x j ) − μ ϕ ) = ϕ ( x i ) T ϕ ( x j ) − ϕ ( x i ) T μ ϕ − ϕ ( x j ) T μ ϕ + μ ϕ T μ ϕ = K ( x i , x j ) − 1 n ∑ k = 1 n ϕ ( x i ) T ϕ ( x k ) − 1 n ∑ k = 1 n ϕ ( x j ) T ϕ ( x k ) + ∥ μ ϕ ∥ 2 = K ( x i , x j ) − 1 n ∑ k = 1 n K ( x i , x k ) − 1 n ∑ k = 1 n K ( x j , x k ) + 1 n 2 ∑ s = 1 n ∑ t = 1 n K ( x s , x t ) \begin{aligned} \hat{K}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right) &=\left(\phi\left(\mathbf{x}_{i}\right)-\boldsymbol{\mu}_{\phi}\right)^{T}\left(\phi\left(\mathbf{x}_{j}\right)-\boldsymbol{\mu}_{\phi}\right) \\ &=\phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right)-\phi\left(\mathbf{x}_{i}\right)^{T} \boldsymbol{\mu}_{\phi}-\phi\left(\mathbf{x}_{j}\right)^{T} \boldsymbol{\mu}_{\phi}+\boldsymbol{\mu}_{\phi}^{T} \boldsymbol{\mu}_{\phi}\\ &=K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)-\frac{1}{n} \sum_{k=1}^{n} \phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{k}\right)-\frac{1}{n} \sum_{k=1}^{n} \phi\left(\mathbf{x}_{j}\right)^{T} \phi\left(\mathbf{x}_{k}\right)+\left\|\boldsymbol{\mu}_{\phi}\right\|^{2} \\ &=K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)-\frac{1}{n} \sum_{k=1}^{n} K\left(\mathbf{x}_{i}, \mathbf{x}_{k}\right)-\frac{1}{n} \sum_{k=1}^{n} K\left(\mathbf{x}_{j}, \mathbf{x}_{k}\right)+\frac{1}{n^{2}} \sum_{s=1}^{n} \sum_{t=1}^{n} K\left(\mathbf{x}_{s}, \mathbf{x}_{t}\right) \end{aligned} K^(xi,xj)=(ϕ(xi)−μϕ)T(ϕ(xj)−μϕ)=ϕ(xi)Tϕ(xj)−ϕ(xi)Tμϕ−ϕ(xj)Tμϕ+μϕTμϕ=K(xi,xj)−n1k=1∑nϕ(xi)Tϕ(xk)−n1k=1∑nϕ(xj)Tϕ(xk)+∥ ∥μϕ∥ ∥2=K(xi,xj)−n1k=1∑nK(xi,xk)−n1k=1∑nK(xj,xk)+n21s=1∑nt=1∑nK(xs,xt)
故
K ^ = K − 1 n 1 n × n K − 1 n K 1 n × n + 1 n 2 1 n × n K 1 n × n = ( I − 1 n 1 n × n ) K ( I − 1 n 1 n × n ) \begin{aligned} \hat{\mathbf{K}} &=\mathbf{K}-\frac{1}{n} \mathbf{1}_{n \times n} \mathbf{K}-\frac{1}{n} \mathbf{K} \mathbf{1}_{n \times n}+\frac{1}{n^{2}} \mathbf{1}_{n \times n} \mathbf{K} \mathbf{1}_{n \times n} \\ &=\left(\mathbf{I}-\frac{1}{n} \mathbf{1}_{n \times n}\right) \mathbf{K}\left(\mathbf{I}-\frac{1}{n} \mathbf{1}_{n \times n}\right) \end{aligned} K^=K−n11n×nK−n1K1n×n+n211n×nK1n×n=(I−n11n×n)K(I−n11n×n)
注意: 1 n × n \mathbf{1}_{n \times n} 1n×n 为全1矩阵,左乘之后每个元素为之前所在列的列和,右乘之和每个元素为之前所在行的行和,左右都乘之后每个元素即为原来所以元素之后。 -
归一化核矩阵
K n ( x i , x j ) = ϕ ( x i ) T ϕ ( x j ) ∥ ϕ ( x i ) ∥ ⋅ ∥ ϕ ( x j ) ∥ = K ( x i , x j ) K ( x i , x i ) ⋅ K ( x j , x j ) \mathbf{K}_{n}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)=\frac{\phi\left(\mathbf{x}_{i}\right)^{T} \phi\left(\mathbf{x}_{j}\right)}{\left\|\phi\left(\mathbf{x}_{i}\right)\right\| \cdot\left\|\phi\left(\mathbf{x}_{j}\right)\right\|}=\frac{K\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)}{\sqrt{K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right) \cdot K\left(\mathbf{x}_{j}, \mathbf{x}_{j}\right)}} Kn(xi,xj)=∥ϕ(xi)∥⋅∥ϕ(xj)∥ϕ(xi)Tϕ(xj)=K(xi,xi)⋅K(xj,xj) K(xi,xj)
令 W = diag ( K ) = ( K ( x 1 , x 1 ) 0 ⋯ 0 0 K ( x 2 , x 2 ) ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ K ( x n , x n ) ) \mathbf{W}=\operatorname{diag}(\mathbf{K})=\left(\begin{array}{cccc} K\left(\mathbf{x}_{1}, \mathbf{x}_{1}\right) & 0 & \cdots & 0 \\ 0 & K\left(\mathbf{x}_{2}, \mathbf{x}_{2}\right) & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & K\left(\mathbf{x}_{n}, \mathbf{x}_{n}\right) \end{array}\right) W=diag(K)=⎝ ⎛K(x1,x1)0⋮00K(x2,x2)⋮0⋯⋯⋱⋯00⋮K(xn,xn)⎠ ⎞,则 W − 1 / 2 ( x i , x i ) = 1 K ( x i , x i ) \mathbf{W}^{-1 / 2}\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)=\frac{1}{\sqrt{K\left(\mathbf{x}_{i}, \mathbf{x}_{i}\right)}} W−1/2(xi,xi)=K(xi,xi) 1,
K n = W − 1 / 2 ⋅ K ⋅ W − 1 / 2 \mathbf{K}_{n}=\mathbf{W}^{-1 / 2} \cdot \mathbf{K} \cdot \mathbf{W}^{-1 / 2} Kn=W−1/2⋅K⋅W−1/2
矩阵左乘右乘对角阵的性质。
5.4 复杂对象的核
5.4.1 字串的谱核
考虑字符集 Σ \Sigma Σ (有限),定义 l l l-谱特征映射:
ϕ l : Σ ∗ → R ∣ Σ ∣ l , ∀ x ∈ Σ ∗ , ϕ l ( x ) = ( ⋯ , # ( α ) , ⋯ ) T \phi_l: \Sigma^{*} \rightarrow \mathbb{R}^{|\Sigma|^l},\forall \mathbf{x} \in \Sigma^{*},\phi_l(\mathbf{x})=\left ( \cdots,\#(\alpha),\cdots \right )^T ϕl:Σ∗→R∣Σ∣l,∀x∈Σ∗,ϕl(x)=(⋯,#(α),⋯)T
其中 # ( α ) \#(\alpha) #(α) 代表长度为 l l l 的子字串在 x \mathbf{x} x 中出现的次数。
l l l-谱核函数: K l : Σ ∗ × Σ ∗ → R \mathbf{K}_l:\Sigma^{*} \times \Sigma^{*} \to \mathbb{R} Kl:Σ∗×Σ∗→R, K l ( x , y ) = ϕ l ( x ) T ϕ l ( y ) \mathbf{K}_l (\mathbf{x},\mathbf{y})=\phi_l(\mathbf{x})^{T} \phi_l(\mathbf{y}) Kl(x,y)=ϕl(x)Tϕl(y)
谱核函数:计算 l = 0 l=0 l=0 到 l = ∞ l=\infty l=∞
5.4.2 图顶点的扩散核
-
图:图 G = ( V , E ) G=(V,E) G=(V,E) 是指一个集合对,其中 V = { v 1 , ⋯ , v n } V=\{v_1,\cdots,v_n\} V={v1,⋯,vn} 为顶点集, E = { ( v i , v j ) } E=\{(v_i,v_j)\} E={(vi,vj)} 为边集。现只考虑无向简单(没有自己到自己的边)图。
-
邻接矩阵:图的邻接矩阵 A ( G ) : = [ A i j ] n × n A(G):=[A_{ij}]_{n\times n} A(G):=[Aij]n×n,其中 A i j = { 1 , ( v i , v j ) ∈ E 0 , ( v i , v j ) ∉ E A_{ij}=\left\{\begin{matrix} 1, (v_i,v_j)\in E\\ 0, (v_i,v_j) \notin E \end{matrix}\right. Aij={1,(vi,vj)∈E0,(vi,vj)∈/E
-
度矩阵: Δ ( G ) : = d i a g ( d 1 , ⋯ , d n ) \Delta (G):=diag(d_1,\cdots,d_n) Δ(G):=diag(d1,⋯,dn),其中 d i d_i di 代表顶点 v i v_i vi 的度,即与 v i v_i vi 相连的边的数目。
-
拉普拉斯矩阵: L ( G ) : = A ( G ) − Δ ( G ) L(G):=A(G)-\Delta(G) L(G):=A(G)−Δ(G)
负拉普拉斯矩阵: L ( G ) : = − L ( G ) L(G):=-L(G) L(G):=−L(G)
它们是实对称
常用图的对称相似性矩阵 S \mathbf{S} S 是指 A ( G ) A(G) A(G), L ( G ) L(G) L(G) 或 − L ( G ) -L(G) −L(G)。
问题:如何定义图顶点的核函数?( S \mathbf{S} S 并不一定是半正定)
- 幂核函数
以 S t \mathbf{S}^t St 作为核矩阵, S \mathbf{S} S 是对称的, t t t 为正整数
考虑 S 2 \mathbf{S}^2 S2: S 2 ( x i , x j ) = ∑ k = 1 n S i k S k j \mathbf{S}^2(x_i,x_j)=\sum\limits_{k=1}^{n}S_{ik}S_{kj} S2(xi,xj)=k=1∑nSikSkj
此公式说明 S 2 \mathbf{S}^2 S2 ( S l \mathbf{S}^l Sl) 的几何意义:顶点间长度为 2 2 2 ( l ) (l) (l) 的路径,描述顶点的相似性。
考虑 S l \mathbf{S}^l Sl 的特征值:设 S \mathbf{S} S 的特征值为 λ 1 , ⋯ , λ n ∈ R \lambda_1,\cdots,\lambda_n \in \mathbb{R} λ1,⋯,λn∈R,则
S = U ( λ 1 0 ⋯ 0 0 λ 2 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ λ n ) U T = ∑ i = 1 n λ i u i u i T \mathbf{S}=\mathbf{U}\left(\begin{array}{cccc} \lambda_{1} & 0 & \cdots & 0 \\ 0 & \lambda_{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_{n} \end{array}\right)\mathbf{U}^{T}=\sum_{i=1}^{n} \lambda_{i}\mathbf{u}_{i} \mathbf{u}_{i}^{T} S=U⎝ ⎛λ10⋮00λ2⋮0⋯⋯⋱⋯00⋮λn⎠ ⎞UT=i=1∑nλiuiuiT
其中 U \mathbf{U} U 是以相应特征向量为列的正交矩阵, U = ( ∣ ∣ ∣ u 1 u 2 ⋯ u n ∣ ∣ ∣ ) \mathbf{U}=\left(\begin{array}{cccc} \mid & \mid & & \mid \\ \mathbf{u}_{1} & \mathbf{u}_{2} & \cdots & \mathbf{u}_{n} \\ \mid & \mid & & \mid \end{array}\right) U=⎝ ⎛∣u1∣∣u2∣⋯∣un∣⎠ ⎞
S l = U ( λ 1 l 0 ⋯ 0 0 λ 2 l ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ λ n l ) U T \mathbf{S}^l=\mathbf{U}\left(\begin{array}{cccc} \lambda_{1}^l & 0 & \cdots & 0 \\ 0 & \lambda_{2}^l & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_{n}^l \end{array}\right)\mathbf{U}^{T} Sl=U⎝ ⎛λ1l0⋮00λ2l⋮0⋯⋯⋱⋯00⋮λnl⎠ ⎞UT
λ 1 l , ⋯ , λ n l \lambda_1^l,\cdots,\lambda_n^l λ1l,⋯,λnl 是 S l \mathbf{S}^l Sl 的特征值。
故若 l l l 是偶数, S l \mathbf{S}^l Sl 半正定。
- 指数扩散核函数
以 K : = e β S \mathbf{K}:=e^{\beta \mathbf{S}} K:=eβS 为核矩阵,其中 β > 0 \beta >0 β>0 为阻尼系数。(泰勒展开: e β x = ∑ l = 0 ∞ 1 l ! β l x l e^{\beta x}=\sum\limits_{l=0}^{\infin}\frac{1}{l!}\beta^{l}x^l eβx=l=0∑∞l!1βlxl)
e β S = ∑ l = 0 ∞ 1 l ! β ′ S l = I + β S + 1 2 ! β 2 S 2 + 1 3 ! β 3 S 3 + ⋯ = ( ∑ i = 1 n u i u i T ) + ( ∑ i = 1 n u i β λ i u i T ) + ( ∑ i = 1 n u i 1 2 ! β 2 λ i 2 u i T ) + ⋯ = ∑ i = 1 n u i ( 1 + β λ i + 1 2 ! β 2 λ i 2 + ⋯ ) u i T = ∑ i = 1 n u i e β λ i u i T = U ( e β λ 1 0 ⋯ 0 0 e β λ 2 ⋯ 0 ⋮ ⋮ ⋱ 0 0 0 ⋯ e β λ n ) U T \begin{aligned} e^{\beta \mathbf{S}} &=\sum_{l=0}^{\infty} \frac{1}{l !} \beta^{\prime} \mathbf{S}^{l} \\ &=\mathbf{I}+\beta \mathbf{S}+\frac{1}{2 !} \beta^{2} \mathbf{S}^{2}+\frac{1}{3 !} \beta^{3} \mathbf{S}^{3}+\cdots\\ &=\left(\sum_{i=1}^{n} \mathbf{u}_{i} \mathbf{u}_{i}^{T}\right)+\left(\sum_{i=1}^{n} \mathbf{u}_{i} \beta \lambda_{i} \mathbf{u}_{i}^{T}\right)+\left(\sum_{i=1}^{n} \mathbf{u}_{i} \frac{1}{2 !} \beta^{2} \lambda_{i}^{2} \mathbf{u}_{i}^{T}\right)+\cdots \\ &=\sum_{i=1}^{n} \mathbf{u}_{i}\left(1+\beta \lambda_{i}+\frac{1}{2 !} \beta^{2} \lambda_{i}^{2}+\cdots\right) \mathbf{u}_{i}^{T} \\ &=\sum_{i=1}^{n} \mathbf{u}_{i} e ^{\beta \lambda_{i}} \mathbf{u}_{i}^{T} \\ &=\mathbf{U}\left(\begin{array}{cccc} e ^{\beta \lambda_{1}} & 0 & \cdots & 0 \\ 0 & e ^{\beta \lambda_{2}} & \cdots & 0 \\ \vdots & \vdots & \ddots & 0 \\ 0 & 0 & \cdots & e ^{\beta \lambda_{n}} \end{array}\right) \mathbf{U}^{T} \end{aligned} eβS=l=0∑∞l!1β′Sl=I+βS+2!1β2S2+3!1β3S3+⋯=(i=1∑nuiuiT)+(i=1∑nuiβλiuiT)+(i=1∑nui2!1β2λi2uiT)+⋯=i=1∑nui(1+βλi+2!1β2λi2+⋯)uiT=i=1∑nuieβλiuiT=U⎝ ⎛eβλ10⋮00eβλ2⋮0⋯⋯⋱⋯000eβλn⎠ ⎞UT
故 K \mathbf{K} K 的特征值 e β λ 1 , ⋯ , e β λ n e ^{\beta \lambda_{1}},\cdots,e ^{\beta \lambda_{n}} eβλ1,⋯,eβλn 完全非负, K \mathbf{K} K 为半正定。
- 纽因曼扩散核函数
以 K = ∑ l = 0 ∞ β l S l \mathbf{K}=\sum\limits_{l=0}^{\infty} \beta^l \mathbf{S}^l K=l=0∑∞βlSl 为核矩阵,注意到
K = I + β S + β 2 S 2 + β 3 S 3 + ⋯ = I + β S ( I + β S + β 2 S 2 + ⋯ ) = I + β S K \begin{aligned} \mathbf{K} &=\mathbf{I}+\beta \mathbf{S}+\beta^{2} \mathbf{S}^{2}+\beta^{3} \mathbf{S}^{3}+\cdots \\ &=\mathbf{I}+\beta \mathbf{S}\left(\mathbf{I}+\beta \mathbf{S}+\beta^{2} \mathbf{S}^{2}+\cdots\right) \\ &=\mathbf{I}+\beta \mathbf{S} \mathbf{K} \end{aligned} K=I+βS+β2S2+β3S3+⋯=I+βS(I+βS+β2S2+⋯)=I+βSK
故
K − β S K = I ( I − β S ) K = I K = ( I − β S ) − 1 \begin{aligned} \mathbf{K}-\beta \mathbf{S} \mathbf{K} &=\mathbf{I} \\ (\mathbf{I}-\beta \mathbf{S}) \mathbf{K} &=\mathbf{I} \\ \mathbf{K} &=(\mathbf{I}-\beta \mathbf{S})^{-1} \end{aligned} K−βSK(I−βS)KK=I=I=(I−βS)−1
在 I − β S \mathbf{I}-\beta \mathbf{S} I−βS 可逆的前提下
K = ( U U T − U ( β Λ ) U T ) − 1 = ( U ( I − β Λ ) U T ) − 1 = U ( I − β Λ ) − 1 U T = U ( 1 1 − β λ 1 0 ⋯ 0 0 1 1 − β λ 2 ⋯ 0 ⋮ ⋮ ⋱ 0 0 0 ⋯ 1 1 − β λ n ) U T \begin{aligned} \mathbf{K} &=\left(\mathbf{U} \mathbf{U}^{T}-\mathbf{U}(\beta \mathbf{\Lambda}) \mathbf{U}^{T}\right)^{-1} \\ &=\left(\mathbf{U}(\mathbf{I}-\beta \mathbf{\Lambda}) \mathbf{U}^{T}\right)^{-1} \\ &=\mathbf{U}(\mathbf{I}-\beta \mathbf{\Lambda})^{-1} \mathbf{U}^{T}\\ &=\mathbf{U}\left(\begin{array}{cccc} \frac{1}{1-\beta\lambda_1} & 0 & \cdots & 0 \\ 0 & \frac{1}{1-\beta\lambda_2} & \cdots & 0 \\ \vdots & \vdots & \ddots & 0 \\ 0 & 0 & \cdots & \frac{1}{1-\beta\lambda_n} \end{array}\right) \mathbf{U}^{T} \end{aligned} K=(UUT−U(βΛ)UT)−1=(U(I−βΛ)UT)−1=U(I−βΛ)−1UT=U⎝ ⎛1−βλ110⋮001−βλ21⋮0⋯⋯⋱⋯0001−βλn1⎠ ⎞UT
要想其半正定,故有:
( 1 − β λ i ) − 1 ≥ 0 1 − β λ i ≥ 0 β λ i ≤ 1 \begin{aligned} \left(1-\beta \lambda_{i}\right)^{-1} & \geq 0 \\ 1-\beta \lambda_{i} & \geq 0 \\ \beta \lambda_{i} & \leq 1 \end{aligned} (1−βλi)−11−βλiβλi≥0≥0≤1