本文介绍了给定属性索引,WEKA生成的模型似乎无法预测类和分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

概述



我正在使用WEKA API 3.7.10(开发人员版本)来使用预制的 .model 文件。



我制作了25个模型:五个算法的五个结果变量。




  • >
  • 备用决策树

  • 随机森林

  • LogitBoost

  • 随机子空间



我在J48,随机子空间和随机森林方面遇到问题。



必需的文件



以下是创建后我的数据的 ARFF 表示形式:

  @relationship WekaData 

@attribute ageDiagNum数字
@attribute raceGroup {黑色,其他,未知,白色}
@attribute stage3 {0,I,IIA,IIB,IIIA,IIIB,IIIC,IIINOS,IV,'UNK Stage'}
@attribute m3 {M0,M1,MX}
@attribute reasonNoCancerSurg { '不是表演med,患者在推荐手术前死亡,不推荐,不推荐,因其他情况而禁忌,推荐但未执行,患者拒绝,推荐但未执行,原因不明,推荐,未知如果已执行,已执行手术,未知;死亡证明或仅尸检的案例'}
@attribute ext2 {00,05,10,11,13,14,15,16,17,18,20,21,23,24,25,26,27, 28,30,31,33,34,35,36,37,38,40,50,60,70,80,85,99}
@attribute time2 {}
@ attribute4 {}
@attribute time6 {}
@attribute time8 {}
@attribute time10 {}

@data
65,White,IIA,MX,'不建议使用,由于其他条件而被禁用',14,?,?,?,?,?

我需要获取二进制属性 time2 到各自模型的 time10






以下是我用来从模型文件的所有文件中获取预测的代码:

 私有静态Map< String ,对象>预测(instances instance,
Classifier classifier,int attributeIndex){
Map< String,Object> map = new LinkedHashMap< String,Object>();
int instanceIndex = 0; //不改变,等于第1行
double [] percent = {0};
double resultValue = 0;
AbstractOutput abstractOutput = null;

if(classifier.getClass()== RandomForest.class || classifier.getClass()== RandomSubSpace.class){
//难以预测time2到time10
instance.setClassIndex(5);
} else {
//在LogitBoost和ADTree
实例中按预期工作。setClassIndex(attributeIndex);
}

试试{
resultValue = classifier.classifyInstance(instances.instance(0));
百分比= classifier.distributionForInstance(instances
.instance(instanceIndex));
} catch(Exception e){
e.printStackTrace();
}

map.put( Class,resultValue);

if(percentage.length> 0){
double percentRaw = 0;
if(outcomeValue == new Double(1)){
percentRaw = percent [1];
} else {
percentRaw = 1-percent [0];
}
map.put( Percentage,percentRaw);
} else {
//因为J48如果为percent [i]则返回错误,因为它为空
map.put( Percentage,new Double(0));
}

返回地图;
}






以下是我的型号用来预测 time2 的结果,因此我们将使用索引6:

  instance.setClassIndex(5); 











问题




  • 正如我之前说的, LogitBoost d ADTree 与其他三个方法相比,在这种简单方法中没有问题,因为我遵循了教程。


  • [已解决] 根据我的调整,和返回
    ArrayOutOfBoundsException time2 time10

      java.lang.ArrayIndexOutOfBoundsException:0 
    在weka.classifiers.meta.Bagging.distributionForInstance(Bagging.java:586)
    在weka.classifiers.trees.RandomForest.distributionForInstance(RandomForest.java :602)在weka.classifiers.AbstractClass上
    ifier.classifyInstance(AbstractClassifier.java:70)

    堆栈跟踪将根本错误指向该行:

      outcomeValue = classifier.classifyInstance(instances.instance(0)); 




  • [已解决] 现在有一个新问题。现在不再返回任何错误,而是返回错误:

      java.lang.ArrayIndexOutOfBoundsException:11 
    在weka .core.DenseInstance.value(DenseInstance.java:332)
    在weka.core.AbstractInstance.isMissing(AbstractInstance.java:315)
    在weka.classifiers.trees.j48.C45Split.whichSubset(C45Split .java:494)weka.classifiers.trees.j48.ClassifierTree.getProbs(ClassifierTree.java:670)
    weka.classifiers.trees.j48.ClassifierTree.classifyInstance(ClassifierTree.java:231 )
    在weka.classifiers.trees.J48.classifyInstance(J48.java:266)

    并跟踪到该行

      outcomeValue = classifier.classifyInstance(instances.instance(0)); 









我希望有人可以帮助我解决这个问题。我真的不知道这段代码有什么问题,因为我已经在线检查了Javadocs和示例,并且常量预测仍然持久。



(我目前正在检查主代码WEKA GUI的程序,但请在这里帮助我:-))

解决方案

我只查看了RandomForest问题现在。这是因为Bagging类
从数据实例本身而不是模型中提取不同类的数量。
您在文本中说time2到time10是二进制的,但是您没有在ARFF文件
中说出来,因此Bagging类不知道有多少个类。

因此,您只需要在ARFF文件中指定time2是二进制的,例如:
@attribute time2 {0,1}



,您将不会再获得任何异常。



我没有研究过J48问题,因为它可能是同一个问题



测试代码:

  public static void main (String [] argv){
try {
分类器cls =(分类器)weka.core.SerializationHelper.read( bosom.100k.2.j48.MODEL);
J48 c =(J48)cls;

DataSource源= new DataSource( data.arff);
实例数据= source.getDataSet();
data.setClassIndex(6);

try {
double resultValue = c.classifyInstance(data.instance(0));
System.out.println( outcome + outcomeValue);
double [] p = c.distributionForInstance(data.instance(0));
System.out.println(Arrays.toString(p));
} catch(Exception e){
e.printStackTrace();
}
} catch(Exception e){
e.printStackTrace();
}


Overview

I am using the WEKA API 3.7.10 (developer version) to use my pre-made .model files.

I made 25 models: five outcome variables for five algorithms.

I am having problems with J48, Random subspace and random forest.

Necessary files

The following is the ARFF representation of my data after creation:

@relation WekaData

@attribute ageDiagNum numeric
@attribute raceGroup {Black,Other,Unknown,White}
@attribute stage3 {0,I,IIA,IIB,IIIA,IIIB,IIIC,IIINOS,IV,'UNK Stage'}
@attribute m3 {M0,M1,MX}
@attribute reasonNoCancerSurg {'Not performed, patient died prior to recommended surgery','Not recommended','Not recommended, contraindicated due to other conditions','Recommended but not performed, patient refused','Recommended but not performed, unknown reason','Recommended, unknown if performed','Surgery performed','Unknown; death certificate or autopsy only case'}
@attribute ext2 {00,05,10,11,13,14,15,16,17,18,20,21,23,24,25,26,27,28,30,31,33,34,35,36,37,38,40,50,60,70,80,85,99}
@attribute time2 {}
@attribute time4 {}
@attribute time6 {}
@attribute time8 {}
@attribute time10 {}

@data
65,White,IIA,MX,'Not recommended, contraindicated due to other conditions',14,?,?,?,?,?

I need to get the binary attributes time2 to time10 from their respective models.


Below are snippets of the code I use to get the predictions from all the model files:

private static Map<String, Object> predict(Instances instances,
        Classifier classifier, int attributeIndex) {
    Map<String, Object> map = new LinkedHashMap<String, Object>();
    int instanceIndex = 0; // do not change, equal to row 1
    double[] percentage = { 0 };
    double outcomeValue = 0;
    AbstractOutput abstractOutput = null;

    if(classifier.getClass() == RandomForest.class || classifier.getClass() == RandomSubSpace.class) {
        // has problems predicting time2 to time10
        instances.setClassIndex(5);
    } else {
        // works as intended in LogitBoost and ADTree
        instances.setClassIndex(attributeIndex);
    }

    try {
        outcomeValue = classifier.classifyInstance(instances.instance(0));
        percentage = classifier.distributionForInstance(instances
                .instance(instanceIndex));
    } catch (Exception e) {
        e.printStackTrace();
    }

    map.put("Class", outcomeValue);

    if (percentage.length > 0) {
        double percentageRaw = 0;
        if (outcomeValue == new Double(1)) {
            percentageRaw = percentage[1];
        } else {
            percentageRaw = 1 - percentage[0];
        }
        map.put("Percentage", percentageRaw);
    } else {
        // because J48 returns an error if percentage[i] because it's empty
        map.put("Percentage", new Double(0));
    }

    return map;
}


Here are the models I use to predict outcome for time2 hence we will use index 6:

instances.setClassIndex(5);

Problems

  • As I said before, LogitBoost and ADTree have no problem in this straightforward method compared to the other three, as I followed the "Use WEKA in your Java code" tutorial.

  • [Solved] Based from my tweakings, RandomForest and RandomSubSpace returns anArrayOutOfBoundsException if told to predict time2 to time10.

    java.lang.ArrayIndexOutOfBoundsException: 0
        at weka.classifiers.meta.Bagging.distributionForInstance(Bagging.java:586)
        at weka.classifiers.trees.RandomForest.distributionForInstance(RandomForest.java:602)
        at weka.classifiers.AbstractClassifier.classifyInstance(AbstractClassifier.java:70)
    

    The stack trace points the root error to the line:

    outcomeValue = classifier.classifyInstance(instances.instance(0));
    

  • [Solved] J48 decision tree has a new problem now. Instead of not providing any predictions, it now returns an error:

    java.lang.ArrayIndexOutOfBoundsException: 11
        at weka.core.DenseInstance.value(DenseInstance.java:332)
        at weka.core.AbstractInstance.isMissing(AbstractInstance.java:315)
        at weka.classifiers.trees.j48.C45Split.whichSubset(C45Split.java:494)
        at weka.classifiers.trees.j48.ClassifierTree.getProbs(ClassifierTree.java:670)
        at weka.classifiers.trees.j48.ClassifierTree.classifyInstance(ClassifierTree.java:231)
        at weka.classifiers.trees.J48.classifyInstance(J48.java:266)
    

    and it traces to the line

    outcomeValue = classifier.classifyInstance(instances.instance(0));
    


I hope someone can help me sort out this issue. I really do not know what is wrong with this code as I have checked the Javadocs and examples online and the constant predictions are still persistent.

(I am currently checking the main program for the WEKA GUI but please help me out here :-) )

解决方案

I've only looked at the RandomForest problem for now. It is because the Bagging classextracts the number of different classes from the data instance itself, and not from the model.You say in your text that time2 to time10 are binary, but you just don't say it in your ARFF file,and so the Bagging class has no clue about how many classes there are.

So you just have to specify in your ARFF file that time2 is binary, e.g.:@attribute time2 {0,1}

and you won't get any Exception any more.

I've not looked at the J48 problem, because it may be the same issue with ARFF definition.

Test code:

  public static void main(String [] argv) {
      try {
        Classifier cls = (Classifier) weka.core.SerializationHelper.read("bosom.100k.2.j48.MODEL");
        J48 c = (J48)cls;

        DataSource source = new DataSource("data.arff");
        Instances data = source.getDataSet();
        data.setClassIndex(6);

        try {
            double outcomeValue = c.classifyInstance(data.instance(0));
            System.out.println("outcome "+outcomeValue);
            double[] p = c.distributionForInstance(data.instance(0));
            System.out.println(Arrays.toString(p));
        } catch (Exception e) {
            e.printStackTrace();
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

这篇关于给定属性索引,WEKA生成的模型似乎无法预测类和分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 18:45