python - 面对ValueError:目标是多类的，但average ='binary'

我是Python新手，也是机器学习的新手。根据我的要求，我正在尝试使用朴素贝叶斯算法来处理数据集。
我能找出准确度，但试图找出精度和召回相同的。但是，它抛出了以下错误：

   "choose another average setting." % y_type)
ValueError: Target is multiclass but average='binary'. Please choose another average setting.

有谁能建议我怎么继续吗？。我试着在精确度和召回分数上使用平均值，它没有错误，但它的准确率、精确度、召回率都是一样的。
我的数据集：
列车数据.csv：

review,label
Colors & clarity is superb,positive
Sadly the picture is not nearly as clear or bright as my 40 inch Samsung,negative

测试数据.csv：

review,label
The picture is clear and beautiful,positive
Picture is not clear,negative

我的代码：

from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import confusion_matrix
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score


def load_data(filename):
    reviews = list()
    labels = list()
    with open(filename) as file:
        file.readline()
        for line in file:
            line = line.strip().split(',')
            labels.append(line[1])
            reviews.append(line[0])

    return reviews, labels

X_train, y_train = load_data('/Users/abc/Sep_10/train_data.csv')
X_test, y_test = load_data('/Users/abc/Sep_10/test_data.csv')

vec = CountVectorizer()

X_train_transformed =  vec.fit_transform(X_train)

X_test_transformed = vec.transform(X_test)

clf= MultinomialNB()
clf.fit(X_train_transformed, y_train)

score = clf.score(X_test_transformed, y_test)
print("score of Naive Bayes algo is :" , score)

y_pred = clf.predict(X_test_transformed)
print(confusion_matrix(y_test,y_pred))

print("Precision Score : ",precision_score(y_test,y_pred,pos_label='positive'))
print("Recall Score :" , recall_score(y_test, y_pred, pos_label='positive') )

最佳答案

您需要添加'average'参数。根据the documentation：
平均值：字符串，[无，'二进制'（默认），'微'，'宏'，
“样本”、“加权”

This parameter is required for multiclass/multilabel targets. If None, the
scores for each class are returned. Otherwise, this
determines the type of averaging performed on the data:

执行以下操作：

print("Precision Score : ",precision_score(y_test, y_pred,
                                           pos_label='positive'
                                           average='micro'))
print("Precision Score : ",recall_score(y_test, y_pred,
                                           pos_label='positive'
                                           average='micro'))

将'micro'替换为除'binary'之外的任何上述选项。此外，在多类设置中，不需要提供'pos_label'，因为它将被忽略。
更新评论：
是的，他们可以平等。它在user guide here中给出：
注意，对于“micro”-在多类设置中平均
包括标签将产生相等的精度，召回和F，而
“加权”平均可能产生不在
精确和召回。

关于python - 面对ValueError:目标是多类的，但average ='binary'，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/52269187/