本文介绍了Scikit-learn的管道:多标签分类错误.稀疏矩阵通过的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用不同的机器学习算法实现不同的分类器.

I am implementing different classifiers using different machine learning algorithms.

我正在对文本文件进行排序,并执行以下操作:

I'm sorting text files, and do as follows:

classifier = Pipeline([
('vectorizer', CountVectorizer ()),
('TFIDF', TfidfTransformer ()),
('clf', OneVsRestClassifier (GaussianNB()))])
classifier.fit(X_train,Y)
predicted = classifier.predict(X_test)

当我使用算法GaussianNB时,会发生以下错误:

When I use the algorithm GaussianNB the following error occurs:

我看到了以下帖子

在本文中,将创建一个类来执行数据转换.可以使用TfidfTransformer修改我的代码.我该如何解决?

In this post a class is created to perform the transformation of the data.It is possible to adapt my code with TfidfTransformer.How I can fix this?

推荐答案

您可以执行以下操作:

class DenseTransformer(TransformerMixin):
    def transform(self, X, y=None, **fit_params):
        return X.todense()

    def fit_transform(self, X, y=None, **fit_params):
        self.fit(X, y, **fit_params)
        return self.transform(X)

    def fit(self, X, y=None, **fit_params):
        return self

classifier = Pipeline([
('vectorizer', CountVectorizer ()),
('TFIDF', TfidfTransformer ()),
('to_dense', DenseTransformer()),
('clf', OneVsRestClassifier (GaussianNB()))])
classifier.fit(X_train,Y)
predicted = classifier.predict(X_test)

现在,作为管道的一部分,数据将转换为密集表示.

Now, as a part of your pipeline, the data will be transform to dense representation.

顺便说一句,我不知道您的限制,但是也许您可以使用其他分类器,例如 RandomForestClassifier SVM 接受稀疏表示形式的数据.

BTW, I don't know your constraints, but maybe you can use another classifier, such as RandomForestClassifier or SVM that DO accept data in sparse representation.

这篇关于Scikit-learn的管道:多标签分类错误.稀疏矩阵通过的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 18:57