通过Java代码在Weka GUI和Weka中获得不同的结果

本文介绍了通过Java代码在Weka GUI和Weka中获得不同的结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用NaiveBayesMultinomialText分类器在Weka中应用文本分类。问题是，当我使用GUI进行操作并在相同的火车数据上进行测试（不进行交叉验证）时，我得到了93％的准确率，而当我尝试通过Java代码进行操作时，我得到了67％的准确率。可能是什么问题？

在GUI中，我使用以下配置：

  Lnorm 2.0 
调试错误
小写字母令牌True 
 minWordFrequency 3.0 
 norm 1.0 
 normalizeDocLength False 
周期性修剪0 
 stemmer NullStemmer 
停用词pt-br-stopwords.dat 
令牌生成器NgramTokenizer（默认参数，但max ngramsize = 2）
 useStopList True 
 useWordFrequencies True

然后在测试选项中选择使用训练集。

现在我有Java代码：

 实例train = readArff（ data / naivebayestest / corpus_treino.arff）; 
 train.setClassIndex（train.numAttributes（）-1）; 
 NaiveBayesMultinomialText nb =新的NaiveBayesMultinomialText（）; 
字符串opt = -W -P 0 -M 5.0 -norm 1.0 -lnorm 2.0-小写字母-stoplist -stopwords C：\\Users\\Fernando\\workspace\\GPCommentsAnalyzer \\pt-br_stopwords.dat -tokenizer \ weka.core.tokenizers.NGramTokenizer -delimiters'\\r\\n\\t。;; ::' \\\（）?! \'-max 2 -min 1\ -stemmer weka.core.stemmers.NullStemmer; 
 nb.setOptions（Utils.splitOptions（opt））; 
 nb.buildClassifier（火车）; 
 
评估eval =新的评估（火车）; 
 eval.evaluateModel（nb，火车）; 
 System.out.println （eval.toSummaryString（））; 
 System.out.println（eval.toClassDetailsString（））; 
 System.out.println（eval.toMatrixString（））;

可能我的Java代码中缺少某些内容。任何想法吗？

解决方案

您可以使用下面的代码通过10CV评估分类器：

  eval.crossValidateModel（nb，train，10，new Random（1））;

您应该记住不要使用 train.Randomize 和 train.Stratify（10）。

I'm applying a text classification in Weka using NaiveBayesMultinomialText classifier. The problem is that when I use the GUI to do it and test on the same train data (without cross validation) I get 93% acurracy, and when I try do it via java code I get 67% acurracy. What might be wrong?

In GUI, I'm using the following configuration:

Lnorm 2.0
debug False
lowercaseTokens True
minWordFrequency 3.0
norm 1.0
normalizeDocLength False
periodicPruning 0
stemmer NullStemmer
stopwords pt-br-stopwords.dat
tokenizer NgramTokenizer (default parameters, but max ngramsize = 2)
useStopList True
useWordFrequencies True

And then I select "Use training set" in "Test options".

Now in java code I have:

        Instances train = readArff("data/naivebayestest/corpus_treino.arff");
        train.setClassIndex(train.numAttributes() - 1);
        NaiveBayesMultinomialText nb = new NaiveBayesMultinomialText();
        String opt = "-W -P 0 -M 5.0 -norm 1.0 -lnorm 2.0 -lowercase -stoplist -stopwords C:\\Users\\Fernando\\workspace\\GPCommentsAnalyzer\\pt-br_stopwords.dat -tokenizer \"weka.core.tokenizers.NGramTokenizer -delimiters ' \\r\\n\\t.,;:\\\'\\\"()?!\' -max 2 -min 1\" -stemmer weka.core.stemmers.NullStemmer";
        nb.setOptions(Utils.splitOptions(opt));
        nb.buildClassifier(train);

        Evaluation eval = new Evaluation(train);
        eval.evaluateModel(nb, train);
        System.out.println(eval.toSummaryString());
        System.out.println(eval.toClassDetailsString());
        System.out.println(eval.toMatrixString());

Probably I'm missing something in my java code.. Any ideas?

Thanks!

解决方案

you can use bellow code for evaluation your classifier with 10CV:

eval.crossValidateModel(nb, train,10,new Random(1));

you should remember that don,t use train.Randomize and train.Stratify(10) before that.

这篇关于通过Java代码在Weka GUI和Weka中获得不同的结果的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！