问题描述
从我的研究中,我发现了三个相互矛盾的结果:
From my research, I found three conflicting results:
SVC(kernel="linear")
is betterLinearSVC
is better- Doesn't matter
谁能解释一下什么时候使用 LinearSVC
和 SVC(kernel="linear")
?
Can someone explain when to use LinearSVC
vs. SVC(kernel="linear")
?
似乎 LinearSVC 比 SVC 稍微好一点,而且通常更挑剔.但是如果 scikit
决定花时间为线性分类实现一个特定的案例,为什么 LinearSVC
不会优于 SVC
?
It seems like LinearSVC is marginally better than SVC and is usually more finicky. But if scikit
decided to spend time on implementing a specific case for linear classification, why wouldn't LinearSVC
outperform SVC
?
推荐答案
在数学上,优化 SVM 是一个凸优化问题,通常具有唯一的最小化器.这意味着这个数学优化问题只有一个解.
Mathematically, optimizing an SVM is a convex optimization problem, usually with a unique minimizer. This means that there is only one solution to this mathematical optimization problem.
结果的差异来自几个方面:SVC
和 LinearSVC
应该优化同样的问题,但实际上都是 liblinear
estimators惩罚拦截,而 libsvm
不会(IIRC).这会导致不同的数学优化问题,从而导致不同的结果.可能还有其他细微的差异,例如缩放和默认损失函数(确保在 LinearSVC
中设置了 loss='hinge'
).接下来,在多类分类中,liblinear
默认情况下是一对一的,而 libsvm
是一对一的.
The differences in results come from several aspects: SVC
and LinearSVC
are supposed to optimize the same problem, but in fact all liblinear
estimators penalize the intercept, whereas libsvm
ones don't (IIRC). This leads to a different mathematical optimization problem and thus different results. There may also be other subtle differences such as scaling and default loss function (edit: make sure you set loss='hinge'
in LinearSVC
). Next, in multiclass classification, liblinear
does one-vs-rest by default whereas libsvm
does one-vs-one.
SGDClassifier(loss='hinge')
与其他两个的不同之处在于它使用随机梯度下降而不是精确梯度下降,并且可能不会收敛到相同的解决方案.但是得到的解可以更好地泛化.
SGDClassifier(loss='hinge')
is different from the other two in the sense that it uses stochastic gradient descent and not exact gradient descent and may not converge to the same solution. However the obtained solution may generalize better.
在SVC
和LinearSVC
之间,一个重要的决策标准是LinearSVC
趋向于更快地收敛,样本数量越大.这是因为线性内核是一种特殊情况,在 Liblinear 中进行了优化,但在 Libsvm 中没有.
Between SVC
and LinearSVC
, one important decision criterion is that LinearSVC
tends to be faster to converge the larger the number of samples is. This is due to the fact that the linear kernel is a special case, which is optimized for in Liblinear, but not in Libsvm.
这篇关于什么时候应该使用 LinearSVC 或 SVC?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!