问题描述
我正在使用不同的机器学习算法实现不同的分类器.
I am implementing different classifiers using different machine learning algorithms.
我正在对文本文件进行排序,并执行以下操作:
I'm sorting text files, and do as follows:
classifier = Pipeline([
('vectorizer', CountVectorizer ()),
('TFIDF', TfidfTransformer ()),
('clf', OneVsRestClassifier (GaussianNB()))])
classifier.fit(X_train,Y)
predicted = classifier.predict(X_test)
当我使用算法GaussianNB时,会发生以下错误:
When I use the algorithm GaussianNB the following error occurs:
我看到了以下帖子
在本文中,将创建一个类来执行数据转换.可以使用TfidfTransformer修改我的代码.我该如何解决?
In this post a class is created to perform the transformation of the data.It is possible to adapt my code with TfidfTransformer.How I can fix this?
推荐答案
您可以执行以下操作:
class DenseTransformer(TransformerMixin):
def transform(self, X, y=None, **fit_params):
return X.todense()
def fit_transform(self, X, y=None, **fit_params):
self.fit(X, y, **fit_params)
return self.transform(X)
def fit(self, X, y=None, **fit_params):
return self
classifier = Pipeline([
('vectorizer', CountVectorizer ()),
('TFIDF', TfidfTransformer ()),
('to_dense', DenseTransformer()),
('clf', OneVsRestClassifier (GaussianNB()))])
classifier.fit(X_train,Y)
predicted = classifier.predict(X_test)
现在,作为管道的一部分,数据将转换为密集表示.
Now, as a part of your pipeline, the data will be transform to dense representation.
顺便说一句,我不知道您的限制,但是也许您可以使用其他分类器,例如 RandomForestClassifier 或 SVM 接受稀疏表示形式的数据.
BTW, I don't know your constraints, but maybe you can use another classifier, such as RandomForestClassifier or SVM that DO accept data in sparse representation.
这篇关于Scikit-learn的管道:多标签分类错误.稀疏矩阵通过的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!