我有一个固定数量的样本,每个样本都有一个概率。现在,我想从该数据源重新采样,以使的新样本的数量等于,每个样本的概率相同。
例如:
random | 0.03 | 0.78 | 0.45 | 0.70
-------+------+------+------+------
sample | 0000 | 0001 | 0002 | 0003 RNG sample | 0000 | 0003 | 0002 | 0003
-------+------+------+------+------ ====> -------+------+------+------+------
prob. | 0.10 | 0.20 | 0.30 | 0.40 prob. | 0.25 | 0.25 | 0.25 | 0.25
就我而言,概率不是直接给出的,而是权重。但是,概率可以直接从权重中得出,因为所有权重的总和是已知的(但不是恒定的)。
在MATLAB实现中,我使用了统计工具箱的randsample函数来实现此重采样过程:
y = randsample(n,k,true,w)
或y = randsample(population,k,true,w)
使用长度为w
的正权向量n
返回替换后获取的加权样本。为i
的条目选择整数y
的概率为w(i)/sum(w)
。通常,w
是概率的向量。 randsample
不支持未经替换的加权采样。function [samples probabilities] = resample(samples, probabilities)
sampleCount = size(samples, 1);
indices = randsample(1 : samplecount, samplecount,
true, probabilities);
samples = samples(indices, :);
probabilities = repmat(1 / sample count, samplecount, 1);
end
我现在想将算法的这一部分移植到iPad 2,在iPad 2上它用于更新实时(〜25fps)数据,其中对 512个样本重新采样。因此,时间效率至关重要,因为还将执行其他计算。内存不必最小化。
我研究了the Alias method,但是似乎结构构建过程非常繁琐,也许不是最有效的解决方案。
是否有其他满足实时需求的有效方法,还是Alias方法可行?
最佳答案
这是一个如何在C语言中实现resample
的示例。
typedef int SampleType;
typedef double ProbabilityType;
static ProbabilityType MyRandomFunction(ProbabilityType total)
{
static boolean_t isRandomReady = 0;
if ( ! isRandomReady ) {
srandomdev();
isRandomReady = 1;
}
long randomMax = INT_MAX;
return (random() % (randomMax + 1)) * (total / randomMax);
}
static void MyResampleFunction(SampleType *samples, ProbabilityType *probabilities, size_t length)
{
ProbabilityType total = 0;
// first, replace probabilities with sums
for ( size_t i = 0; i < length; i++ )
probabilities[i] = total += probabilities[i];
// create a copy of samples as samples will be modified
SampleType *sampleCopies = malloc(sizeof(SampleType) * length);
memcpy(sampleCopies, samples, sizeof(SampleType) * length);
for ( size_t i = 0; i < length; i++ )
{
ProbabilityType probability = MyRandomFunction(total);
// We could iterate through the probablities array but binary search is more efficient
// This is a block declaration
int (^comparator)(const void *, const void *);
// Blocks are the same a function pointers
// execept they capture their enclosing scope
comparator = ^(const void *leftPtr, const void *rightPtr) {
// leftPtr points to probability
// rightPtr to an element in probabilities
ProbabilityType curr, prev;
size_t idx = ((const ProbabilityType *) rightPtr) - probabilities;
curr = probabilities[idx]; // current probablity
prev = idx > 0 ? probabilities[idx - 1] : 0; // previous probablity
if ( curr < probability )
return 1;
if ( prev > probability )
return -1;
return 0;
};
void *found = bsearch_b(&probability, // the searched value
probabilities, // the searched array
length, // the length of array
sizeof(ProbabilityType), // the size of values
comparator); // the comparator
size_t idx = ((const ProbabilityType *) found) - probabilities;
samples[i] = sampleCopies[idx];
}
// now, probabilities are all the same
for ( size_t i = 0; i < length; i++ )
probabilities[i] = 1.0 / length;
// Now the can dispose of the copies
free(sampleCopies);
}
static void MyTestFunction()
{
SampleType samples[4] = {0, 1, 2, 3};
ProbabilityType probabilities[10] = {0.1, 0.2, 0.3, 0.4};
MyResampleFunction(samples, probabilities, 4);
}
关于objective-c - 带替换的加权随机抽样的高效算法,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/8605065/