我是Python新手,也是机器学习的新手。根据我的要求,我正在尝试使用朴素贝叶斯算法来处理数据集。
我能找出准确度,但试图找出精度和召回相同的。但是,它抛出了以下错误:
"choose another average setting." % y_type)
ValueError: Target is multiclass but average='binary'. Please choose another average setting.
有谁能建议我怎么继续吗?。我试着在精确度和召回分数上使用平均值,它没有错误,但它的准确率、精确度、召回率都是一样的。
我的数据集:
列车数据.csv:
review,label
Colors & clarity is superb,positive
Sadly the picture is not nearly as clear or bright as my 40 inch Samsung,negative
测试数据.csv:
review,label
The picture is clear and beautiful,positive
Picture is not clear,negative
我的代码:
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import confusion_matrix
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
def load_data(filename):
reviews = list()
labels = list()
with open(filename) as file:
file.readline()
for line in file:
line = line.strip().split(',')
labels.append(line[1])
reviews.append(line[0])
return reviews, labels
X_train, y_train = load_data('/Users/abc/Sep_10/train_data.csv')
X_test, y_test = load_data('/Users/abc/Sep_10/test_data.csv')
vec = CountVectorizer()
X_train_transformed = vec.fit_transform(X_train)
X_test_transformed = vec.transform(X_test)
clf= MultinomialNB()
clf.fit(X_train_transformed, y_train)
score = clf.score(X_test_transformed, y_test)
print("score of Naive Bayes algo is :" , score)
y_pred = clf.predict(X_test_transformed)
print(confusion_matrix(y_test,y_pred))
print("Precision Score : ",precision_score(y_test,y_pred,pos_label='positive'))
print("Recall Score :" , recall_score(y_test, y_pred, pos_label='positive') )
最佳答案
您需要添加'average'
参数。根据the documentation:
平均值:字符串,[无,'二进制'(默认),'微','宏',
“样本”、“加权”
This parameter is required for multiclass/multilabel targets. If None, the
scores for each class are returned. Otherwise, this
determines the type of averaging performed on the data:
执行以下操作:
print("Precision Score : ",precision_score(y_test, y_pred,
pos_label='positive'
average='micro'))
print("Precision Score : ",recall_score(y_test, y_pred,
pos_label='positive'
average='micro'))
将
'micro'
替换为除'binary'
之外的任何上述选项。此外,在多类设置中,不需要提供'pos_label'
,因为它将被忽略。更新评论:
是的,他们可以平等。它在user guide here中给出:
注意,对于“micro”-在多类设置中平均
包括标签将产生相等的精度,召回和F,而
“加权”平均可能产生不在
精确和召回。
关于python - 面对ValueError:目标是多类的,但average ='binary',我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/52269187/