问题描述
假设我正在处理一些分类问题. (欺诈检测和垃圾评论是我目前正在研究的两个问题,但我对总体上的任何分类任务感到好奇.)
Suppose I'm working on some classification problem. (Fraud detection and comment spam are two problems I'm working on right now, but I'm curious about any classification task in general.)
我怎么知道应该使用哪个分类器?
How do I know which classifier I should use?
- 决策树
- SVM
- 贝叶斯
- 神经网络
- K近邻
- Q学习
- 遗传算法
- 马尔可夫决策过程
- 卷积神经网络
- 线性回归或逻辑回归
- 增压,装袋,诱人
- 随机爬山或模拟退火
- ...
- Decision tree
- SVM
- Bayesian
- Neural network
- K-nearest neighbors
- Q-learning
- Genetic algorithm
- Markov decision processes
- Convolutional neural networks
- Linear regression or logistic regression
- Boosting, bagging, ensambling
- Random hill climbing or simulated annealing
- ...
在哪些情况下是自然"的首选,选择该原则的原则是什么?
In which cases is one of these the "natural" first choice, and what are the principles for choosing that one?
我正在寻找答案类型的示例(摘自Manning等人的 信息检索简介 本书):
Examples of the type of answers I'm looking for (from Manning et al.'s Introduction to Information Retrieval book):
a. 如果您的数据带有标签,但数量有限,则应使用偏差较大的分类器(例如,朴素贝叶斯).
我猜这是因为偏高的分类器具有较低的方差,这很好,因为数据量少.
I'm guessing this is because a higher-bias classifier will have lower variance, which is good because of the small amount of data.
b. 如果您有大量数据,则分类器实际上并不重要,因此您可能应该只选择具有良好可伸缩性的分类器.
-
还有哪些其他准则?即使是诸如如果您必须向某些高级管理人员解释模型,那么您也应该使用决策树,因为决策规则是相当透明的"之类的答案是很好的.不过,我不太关心实现/库问题.
What are other guidelines? Even answers like "if you'll have to explain your model to some upper management person, then maybe you should use a decision tree, since the decision rules are fairly transparent" are good. I care less about implementation/library issues, though.
还有一个单独的问题,除了标准的贝叶斯分类器之外,是否还有标准的最新技术"用于检测垃圾邮件(与电子邮件垃圾邮件相对)?
Also, for a somewhat separate question, besides standard Bayesian classifiers, are there 'standard state-of-the-art' methods for comment spam detection (as opposed to email spam)?
推荐答案
首先,您需要确定您的问题.这取决于您拥有什么样的数据以及您想要完成的任务.
First of all, you need to identify your problem. It depends upon what kind of data you have and what your desired task is.
- 您有
Labeled Data
- 您需要遵循
Classification Approach
及其算法
- You have
Labeled Data
- You need to follow
Classification Approach
and its algorithms
- 您需要参加
Clustering Approach
- You need to go for
Clustering Approach
如果您是
Predicting Quantity
:- 您需要参加
Regression Approach
- You need to go for
Regression Approach
否则
- 您可以申请
Dimensionality Reduction Approach
- You can go for
Dimensionality Reduction Approach
上述每种方法中都有不同的算法.特定算法的选择取决于数据集的大小.
There are different algorithms within each approach mentioned above. The choice of a particular algorithm depends upon the size of the dataset.
来源: http://scikit-learn.org/stable/tutorial/machine_learning_map/
这篇关于通常,选择哪个机器学习分类器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
- You need to follow
- 您需要遵循