本文介绍了Weka中的交叉验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从我阅读的内容中,我一直认为交叉验证是这样执行的:

I've always thought from what I read that cross validation is performed like this:

因此,建立了k个模型,最后一个是这些模型的平均值.在Weka指南中写道,每个模型总是使用所有数据集构建的.那么,Weka中的交叉验证如何工作?是根据所有数据构建的模型,交叉验证"是否意味着创建了k折,然后对每个折进行了评估,最终输出结果仅仅是折的平均结果?

So k models are built and the final one is the average of those.In Weka guide is written that each model is always built using ALL the data set. So how does cross validation in Weka work ? Is the model built from all data and the "cross-validation" means that k fold are created then each fold is evaluated on it and the final output results is simply the averaged result from folds?

推荐答案

那么,这又是场景:您有100个带标签的数据

So, here is the scenario again: you have 100 labeled data

使用训练集

  • weka将获取100个带有标签的数据
  • 它将应用算法从这100个数据中构建分类器
  • 将分类器再次应用于这100个数据
  • 它为您提供了分类器(应用于与之相同的100个数据开发)
  • weka will take 100 labeled data
  • it will apply an algorithm to build a classifier from these 100 data
  • it applies that classifier AGAIN onthese 100 data
  • it provides you with the performance of theclassifier (applied to the same 100 data from which it wasdeveloped)

使用10折CV

  • Weka提取了100个标记数据

  • Weka takes 100 labeled data

它产生10个相等大小的集合.每组分为两组:用于训练的90个标记数据和用于测试的10个标记数据.

it produces 10 equal sized sets. Each set is divided into two groups: 90 labeled data are used for training and 10 labeled data are used for testing.

它使用来自90个标记数据的算法生成分类器,并将其应用于集合1的10个测试数据.

it produces a classifier with an algorithm from 90 labeled data and applies that on the 10 testing data for set 1.

对于第2组到第10组它执行相同的操作,并产生另外9个分类器

It does the same thing for set 2 to 10 and produces 9 more classifiers

它平均了10个相同大小(90个训练和10个测试)集生成的10个分类器的性能

it averages the performance of the 10 classifiers produced from 10 equal sized (90 training and 10 testing) sets

让我知道这是否回答了您的问题.

Let me know if that answers your question.

这篇关于Weka中的交叉验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 18:44