问题描述
我正在像这样在sklearn中运行MultinomialNB之前规范我的文本输入:
I am normalizing my text input before running MultinomialNB in sklearn like this:
vectorizer = TfidfVectorizer(max_df=0.5, stop_words='english', use_idf=True)
lsa = TruncatedSVD(n_components=100)
mnb = MultinomialNB(alpha=0.01)
train_text = vectorizer.fit_transform(raw_text_train)
train_text = lsa.fit_transform(train_text)
train_text = Normalizer(copy=False).fit_transform(train_text)
mnb.fit(train_text, train_labels)
不幸的是,MultinomialNB不接受LSA阶段创建的非负值.有解决这个问题的想法吗?
Unfortunately, MultinomialNB does not accept the non-negative values created during the LSA stage. Any ideas for getting around this?
推荐答案
我建议您不要将朴素贝叶斯与SVD或其他矩阵分解一起使用,因为朴素贝叶斯基于将贝叶斯定理应用到强(朴素)上功能之间的独立性假设.使用其他分类器,例如RandomForest
I recommend you that don't use Naive Bayes with SVD or other matrix factorization because Naive Bayes based on applying Bayes' theorem with strong (naive) independence assumptions between the features. Use other classifier, for example RandomForest
我用这个结果尝试了这个实验:
I tried this experiment with this results:
vectorizer = TfidfVectorizer(max_df=0.5, stop_words='english', use_idf=True)
lsa = NMF(n_components=100)
mnb = MultinomialNB(alpha=0.01)
train_text = vectorizer.fit_transform(raw_text_train)
train_text = lsa.fit_transform(train_text)
train_text = Normalizer(copy=False).fit_transform(train_text)
mnb.fit(train_text, train_labels)
这是相同的情况,但是我使用NMP(非负矩阵分解)而不是SVD并获得了0.04%的准确度.
This is the same case but I'm using NMP(non-negative matrix factorization) instead SVD and got 0,04% accuracy.
更改用于RandomForest的分类器MultinomialNB的准确率达到了79%.
Changing the classifier MultinomialNB for RandomForest i got 79% accuracy.
因此,请更改分类器或不应用矩阵分解.
这篇关于在sklearn MultinomialNB中处理负值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!