问题描述
使用NLTK中的Bernoulli Naive Bayes算法和scikit-learn模块中的一种对文本(仅分为两类)进行分类时,我得到了截然不同的结果.尽管两者之间的总体精度是可比的(尽管相差甚远),但I型和II型错误的差异却很明显.特别是,NLTK朴素贝叶斯分类器给出的Type I错误要多于Type II错误,而scikit-learn则相反.这种异常"似乎在不同功能和不同训练样本之间是一致的.是否有一个原因 ?两者中哪个更值得信赖?
I am getting quite different results when classifying text (in only two categories) with the Bernoulli Naive Bayes algorithm in NLTK and the one in scikit-learn module. Although the overall accuracy is comparable between the two (although far from identical) the difference in Type I and Type II errors is significant. In particular, the NLTK Naive Bayes classifier would give more Type I than Type II errors , while the scikit-learn -- the opposite. This 'anomaly' seem to be consistent across different features and different training samples. Is there a reason for this ? Which of the two is more trustworthy?
推荐答案
NLTK未实现Bernoulli Naive Bayes.它实现多项式朴素贝叶斯(Naive Bayes),但仅允许二进制功能.
NLTK does not implement Bernoulli Naive Bayes. It implements multinomial Naive Bayes but only allows binary features.
这篇关于NLTK和scikit-learn中的Bernoulli Naive Bayes之间有不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!