


I am new to machine learning field and right now trying to get a grasp of how the most common learning algorithms work and understand when to apply each one of them. At the moment I am learning on how Support Vector Machines work and have a question on custom kernel functions.
There is plenty of information on the web on more standard (linear, RBF, polynomial) kernels for SVMs. I, however, would like to understand when it is reasonable to go for a custom kernel function. My questions are:


1) What are other possible kernels for SVMs?
2) In which situation one would apply custom kernels?
3) Can custom kernel substantially improve prediction quality of SVM?



There are infinitely many of these, see for example list of ones implemented in pykernels (which is far from being exhaustive)


  • 线性
  • 多项式
  • RBF
  • 余弦相似度
  • 指数
  • 拉普拉斯语
  • 理性二次方
  • 逆二次方
  • 可爱
  • T-学生
  • 方差分析
  • 加成Chi ^ 2
  • Chi ^ 2
  • MinMax
  • 最小/直方图交点
  • 广义直方图交点
  • 样条
  • Sorensen
  • Tanimoto
  • 小波
  • 傅里叶
  • 日志(CPD)
  • 电源(CPD)
  • Linear
  • Polynomial
  • RBF
  • Cosine similarity
  • Exponential
  • Laplacian
  • Rational quadratic
  • Inverse multiquadratic
  • Cauchy
  • T-Student
  • Additive Chi^2
  • Chi^2
  • MinMax
  • Min/Histogram intersection
  • Generalized histogram intersection
  • Spline
  • Sorensen
  • Tanimoto
  • Wavelet
  • Fourier
  • Log (CPD)
  • Power (CPD)


  • 简单"的结果很糟糕
  • 数据在某种意义上是特定的,因此-为了应用传统内核,必须对其进行退化.例如,如果您的数据是图形格式,则您不能应用RBF内核,因为图形不是恒定大小的向量,因此您需要一个图形内核来与该对象一起使用,而无需某种形式的信息丢失投影.有时您还可以深入了解数据,了解一些基础结构,这可能有助于分类.这样的例子就是周期性,您知道您的数据中有一种恢复作用-那么可能值得寻找特定的内核等.


Yes, in particular there always exists a (hypothethical) Bayesian optimal kernel, defined as:

K(x, y) = 1 iff arg max_l P(l|x) == arg max_l P(l|y)

换句话说,如果将标签l的真实概率P(l | x)分配给点x,那么我们可以创建一个内核,该内核几乎将您的数据点映射到它们的单点编码最可能的标签,从而导致贝叶斯最佳分类(因为它将获得贝叶斯风险).

in other words, if one has a true probability P(l|x) of label l being assigned to a point x, then we can create a kernel, which pretty much maps your data points onto one-hot encodings of their most probable labels, thus leading to Bayes optimal classification (as it will obtain Bayes risk).


In practise it is of course impossible to get such kernel, as it means that you already solved your problem. However, it shows that there is a notion of "optimal kernel", and obviously none of the classical ones is not of this type (unless your data comes from veeeery simple distributions). Furthermore, each kernel is a kind of prior over decision functions - closer you get to the actual one with your induced family of functions - the more probable is to get a reasonable classifier with SVM.


08-04 21:48