本文介绍了如何从具有非均匀概率的列表中选择值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看初始化算法。该算法的以下两个步骤产生非均匀概率:

I am looking at the k-means++ initialization algorithm. The following two steps of the algorithm give rise to non-uniform probabilities:

使用加权$ b $随机选择一个新数据点作为新中心b概率分布,其中点x以与D(x)^ 2成比例的概率
选择。

Choose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with probability proportional to D(x)^2.

使用C ++中的这个所述加权概率分布来选择

How can I select with this stated weighted probability distribution in C++?

推荐答案

概率分布。

最简单的方法是按顺序枚举点X,并计算一个表示它们累积概率分布函数的数组:(伪代码如下)

The easiest way to do this is to enumerate the points X in order, and calculate an array representing their cumulative probability distribution function: (pseudocode follows)

/* 
 * xset is an array of points X,
 * cdf is a preallocated array of the same size
 */
function prepare_cdf(X[] xset, float[] cdf)
{
   float S = 0;
   int N = sizeof(xset);
   for i = 0:N-1
   {
      float weight = /* calculate D(xset[i])^2 here */
      // create cumulative sums and write to the element in cdf array
      S += weight;
      cdf[i] = S;
   }

   // now normalize so the CDF runs from 0 to 1
   for i = 0:N-1
   {
      cdf[i] /= S;
   }
}

function select_point(X[] xset, float[] cdf, Randomizer r)
{
   // generate a random floating point number from a 
   // uniform distribution from 0 to 1
   float p = r.nextFloatUniformPDF();
   int i = binarySearch(cdf, p);
   // find the lowest index i such that p < cdf[i]

   return xset[i];
}

您调用prepare_cdf一次,然后根据需要调用select_point生成随机点。

You call prepare_cdf once, and then call select_point as many times as you need to generate random points.

这篇关于如何从具有非均匀概率的列表中选择值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-11 22:01