问题描述
我对sklearn.metrics.f1_score中的weighted
平均值有疑问
I have a question regarding weighted
average in sklearn.metrics.f1_score
sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average='weighted', sample_weight=None)
Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
首先,如果有任何引用证明使用weighted-F1是合理的,那么我只是好奇主义者,在这种情况下,我应该使用weighted-F1.
First, if there is any reference that justifies the usage of weighted-F1, I am just curios in which cases I should use weighted-F1.
第二,我听说不赞成使用加权F1,是真的吗?
Second, I heard that weighted-F1 is deprecated, is it true?
例如,第三,如何实际计算加权F1
Third, how actually weighted-F1 is being calculated, for example
{
"0": {
"TP": 2,
"FP": 1,
"FN": 0,
"F1": 0.8
},
"1": {
"TP": 0,
"FP": 2,
"FN": 2,
"F1": -1
},
"2": {
"TP": 1,
"FP": 1,
"FN": 2,
"F1": 0.4
}
}
如何计算上述示例的加权F1.我虽然应该是(0.8 * 2/3 + 0.4 * 1/3)/3,但是我错了.
How to calculate weighted-F1 of the above example. I though it should be something like (0.8*2/3 + 0.4*1/3)/3, however I was wrong.
推荐答案
我没有任何参考资料,但是如果您对多标签分类感兴趣,而您希望关心 all 类的精度/召回率,那么加权f1-score是合适的.如果您具有只关注阳性样本的二进制分类,那么可能不合适.
I don't have any references, but if you're interested in multi-label classification where you care about precision/recall of all classes, then the weighted f1-score is appropriate. If you have binary classification where you just care about the positive samples, then it is probably not appropriate.
否,加权F1本身不被弃用.在v0.16中,仅弃用了功能接口的某些方面,然后仅在以前含糊不清的情况下使其更加明确. (在github上的历史讨论 或查看源代码,然后在页面上搜索不建议使用"查找详细信息.)
No, weighted-F1 itself is not being deprecated. Only some aspects of the function interface were deprecated, back in v0.16, and then only to make it more explicit in previously ambiguous situations. (Historical discussion on github or check out the source code and search the page for "deprecated" to find details.)
来自f1_score
的文档:
``'weighted'``:
Calculate metrics for each label, and find their average, weighted
by support (the number of true instances for each label). This
alters 'macro' to account for label imbalance; it can result in an
F-score that is not between precision and recall.
因此,平均值由 support 加权,即具有给定标签的样本数.由于上面的示例数据不包括该支持,因此无法根据您列出的信息来计算加权的f1分数.
So the average is weighted by the support, which is the number of samples with a given label. Because your example data above does not include the support, it is impossible to compute the weighted f1 score from the information you listed.
这篇关于scikit加权F1分数的计算和使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!