本文介绍了List :: Util'shuffle'实际上如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用c5.0构建分类器.我有8000个条目的数据集,每个条目都有自己的ID号(1-8000).在测试分类器的性能时,我必须进行5组10:90的分割(训练数据:测试数据).当然,任何训练用例都不能再次出现在测试用例中,并且任何一组都不能重复出现.

I am currently working on building a classifier using c5.0. I have a dataset of 8000 entries and each entry has its own i.d number (1-8000). When testing the performance of the classifier I had to make 5sets of 10:90 (training data: test data) splits. Of course any training cases cannot appear again in the test cases, and duplicates cannot occur in either set.

为解决随机抽取训练数据示例的问题,并确保不能从测试数据中抽取示例,我开发了一种极其缓慢的方法;

To solve the problem of picking examples at random for the training data, and making sure the same cannot be picked for the test data I have developed a horribly slow method;

  • 在单独的行中用1-8000的数字填充文件.

  • fill a file with numbers from 1-8000 on separate lines.

随机选择一个行号(范围为1-8000),并将该行的内容用作训练示例的ID号.

randomly pick a line number (from a range of 1-8000) and use the contents of the line as the id number of the training example.

将所有未选择的数字写入新文件

write all unpicked numbers to a new file

将随机数生成器的范围减小1

decrement the range of the random number generator by 1

重做

然后将所有未选择的数字用作测试数据.它可以工作,但是速度很慢.为了加快速度,我可以使用List :: Util'shuffle'来随机地对这些数字进行随机排列.但是随机播放"有多随机?保持相同水平的准确性至关重要.对这篇文章很抱歉,但是没有人知道改组"是如何工作的.任何帮助都很好

Then all unpicked numbers are used as test data. It works but its slow. To speed things up I could use List::Util 'shuffle' to just 'randomly' shuffle and array of these numbers. But how random is 'shuffle'? It is essential that the same level of accuracy is maintained. Sorry about the essay, but does anyone know how 'shuffle' actually works. Any help at all would be great

推荐答案

这是列表:: Util :: PP

sub shuffle (@) {
  my @a=\(@_);
  my $n;
  my $i=@_;
  map {
    $n = rand($i--);
    (${$a[$n]}, $a[$n] = $a[$i])[0];
  } @_;
}

看上去像是 Fisher-Yates 随机播放.

这篇关于List :: Util'shuffle'实际上如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 19:50