本文介绍了如何编辑weka配置以找到“1"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有布尔结果的 arff 表.

I have an arff table with bool results.

大多数行以0"结尾(例如 95%).但是0"对我不感兴趣.我希望 weka 找到以1"结尾的行.

Most of the lines end with "0" (like 95%). But the "0" don't interesting me. i want weka to find lines that end with "1".

但不幸的是,大多数算法始终只选择0".这对我一点帮助都没有.

But unfortunately, most of the algorithms just select "0" all of the time. That don't help to me at all.

如何让weka只达到1"?(如果可能的话)?

How to make weka reach "1" only? (If it possible)?

推荐答案

我认为您在描述经典的类不平衡问题.也就是说,几乎所有机器学习算法都旨在寻找最佳准确性.在您的情况下,如果它每次产生 95% 的准确度时都分配 0,这是它所能做的最好的事情.(有关谷歌不平衡类或类不平衡的更多信息).然而,在这种情况下,少数群体更受关注.

I think you are describing classical class imbalance problem . That is, almost every machine learning algorithm is designed to look for best accuracy. In your case if it assigns 0 each time it yields 95% accurancy and that is the best what it can do. (for more info google unbalanced classes, or class imbalance). However in cases like this the minority class is of greater interest.

几个快速的解决方案是:上采样类 1 或下采样类 2,或将两者结合起来以获得用于训练的平衡数据集 - 您可以为此使用 WEKA SpreadSubsample.您还可以查看 SMOTE 过滤器和 MetaCost 分类器.

Few quick solutions are:upsample class 1 or downsample class 2, or combine both in order to get balanced dataset for training - you can use WEKA SpreadSubsample for that. You can also have a look at SMOTE filter and MetaCost classifier.

如果您出于某种原因对准确性感兴趣,则必须在原始分布上测试分类器,因此请使用 SpreadSubsample 作为过滤分类器.但是,您可能已经注意到,如果您对少数类感兴趣,则准确率并不是模型性能的可靠指标.所以看看类召回率、ROC 曲线和 AUC.关于 ROC 的好文章在这里 http://www.hpl.hp.com/techreports/2003/HPL-2003-4.pdf

If you are for some reason interested in accuracy you have to test classifier on original distribution so use SpreadSubsample as filtered classifier. However as you may already noticed, if you are interested in minority class, accuracy is not very reliable indicator of model performance. So have a look at class recall, ROC curve and AUC. Great article about ROC is here http://www.hpl.hp.com/techreports/2003/HPL-2003-4.pdf

祝你好运

这篇关于如何编辑weka配置以找到“1"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-24 10:33