问题描述
我有一组> 2000个数字,从测量中收集。我想从这个数据集中抽样,每次测试约10次,同时保持整体的概率分布,并在每次测试中(尽可能地扩展)。例如,在每个测试中,我想要一些小值,一些中产阶级值,一些大值,其中均值和方差大致接近原始分布。结合所有测试,我还想要所有样本的总平均值和方差,大约接近原始分布。
因为我的数据集是
图1.约2k数据元素的密度图。
我正在使用Java,现在我正在使用,并使用数据集中的随机int,并返回该位置的数据元素:
public int getRandomData(){
int data [] = {1231,414,222,4211 ,,,41,203,123,432,...};
length = data.length;
随机r = new Random();
int randomInt = r.nextInt(length);
返回数据[randomInt];
}
我不知道它是否正常工作,因为我使用数据按顺序测量,它具有大量的序列相关性。
它可以按你的需要工作。数据的顺序无关紧要。
I have a set of >2000 numbers, gathered from measurement. I want to sample from this data set, ~10 times in each test, while preserving probability distribution overall, and in each test (to extent approximately possible). For example, in each test, I want some small value, some middle class value, some big value, with the mean and variance approximately close to the original distribution. Combining all the tests, I also want the total mean and variance of all the samples, approximately close to the original distribution.
As my dataset is a long-tail probability distribution, the amount of data at each quantile are not the same:
Fig 1. Density plot of ~2k elements of data.
I am using Java, and right now I am using a uniform distribution, and use a random int from the dataset, and return the data element at that position:
public int getRandomData() {
int data[] ={1231,414,222,4211,,41,203,123,432,...};
length=data.length;
Random r=new Random();
int randomInt = r.nextInt(length);
return data[randomInt];
}
I don't know if it works as I want, because I use data in order it is measured, which has great amount of serial correlation.
It works as you want. The order of the data is irrelevant.
这篇关于从数据集中随机抽样,同时保留原始概率分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!