



  1. 决策树
  2. 支持向量机
  3. 贝叶斯
  4. 神经网络
  5. K-最近邻
  6. Q 学习
  7. 遗传算法
  8. 马尔科夫决策过程
  9. 卷积神经网络
  10. 线性回归或逻辑回归
  11. Boosting、bagging、集成
  12. 随机爬山或模拟退火
  13. ...


我正在寻找的答案类型示例(来自 Manning 等人的

Suppose I'm working on some classification problem. (Fraud detection and comment spam are two problems I'm working on right now, but I'm curious about any classification task in general.)

How do I know which classifier I should use?

  1. Decision tree
  2. SVM
  3. Bayesian
  4. Neural network
  5. K-nearest neighbors
  6. Q-learning
  7. Genetic algorithm
  8. Markov decision processes
  9. Convolutional neural networks
  10. Linear regression or logistic regression
  11. Boosting, bagging, ensambling
  12. Random hill climbing or simulated annealing
  13. ...

In which cases is one of these the "natural" first choice, and what are the principles for choosing that one?

Examples of the type of answers I'm looking for (from Manning et al.'s Introduction to Information Retrieval book):

a. If your data is labeled, but you only have a limited amount, you should use a classifier with high bias (for example, Naive Bayes).

I'm guessing this is because a higher-bias classifier will have lower variance, which is good because of the small amount of data.

b. If you have a ton of data, then the classifier doesn't really matter so much, so you should probably just choose a classifier with good scalability.

  1. What are other guidelines? Even answers like "if you'll have to explain your model to some upper management person, then maybe you should use a decision tree, since the decision rules are fairly transparent" are good. I care less about implementation/library issues, though.

  2. Also, for a somewhat separate question, besides standard Bayesian classifiers, are there 'standard state-of-the-art' methods for comment spam detection (as opposed to email spam)?


First of all, you need to identify your problem. It depends upon what kind of data you have and what your desired task is.

There are different algorithms within each approach mentioned above. The choice of a particular algorithm depends upon the size of the dataset.

Source: http://scikit-learn.org/stable/tutorial/machine_learning_map/


08-20 09:12