


I have data wherein I have a variable z that contains around 4000 values (from 0.0 to 1.0) for which the histogram looks like this.


Now I need to generate a random variable, call it random_z which should replicate the above distribution.

到目前为止,我尝试生成一个以1.0为中心的正态分布,这样我就可以删除所有高于1.0的正态分布,以获得相似的分布.我一直在使用numpy.random.normal,但是问题是我无法将范围设置为0.0到1.0,因为通常正态分布的均值= 0.0且std dev = 1.0.

What I have tried so far is to generate a normal distribution centered at 1.0 so that I can remove all those above 1.0 to get a distribution that will be similar. I have been using numpy.random.normal but the problem is that I cannot set the range from 0.0 to 1.0, because usually normal distribution has a mean = 0.0 and std dev = 1.0.


Is there another way to go about generating this distribution in Python?



If you want to bootstrap you could use random.choice() on your observed series.


Here I'll assume you'd like to smooth a bit more than that and you aren't concerned with generating new extreme values.


Use pandas.Series.quantile() and a uniform [0,1] random number generator, as follows.


  • 将您的随机样本放入pandas系列中,称为该系列S


  1. 以通常的方式生成介于0.0和1.0之间的随机数u,例如,random.random()
  2. 返回S.quantile(u)
  1. Generate a random number u between 0.0 and 1.0 the usual way, e.g.,random.random()
  2. return S.quantile(u)

如果您更愿意使用numpy而不是pandas,那么从快速阅读中看,您可以替换 numpy.percentile()

If you'd rather use numpy than pandas, from a quick reading it looks like you can substitute numpy.percentile() in step 2.



From the sample S, pandas.series.quantile() or numpy.percentile() is used to calculate the inverse cumulative distribution function for the method of Inverse transform sampling. The quantile or percentile function (relative to S) transforms a uniform [0,1] pseudo random number to a pseudo random number having the range and distribution of the sample S.


If you need to minimize coding and don't want to write and use functions that only returns a single realization, then it seems numpy.percentile bests pandas.Series.quantile.


Let S be a pre-existing sample.


u will be the new uniform random numbers


newR will be the new randoms drawn from a S-like distribution.

>>> import numpy as np


I need a sample of the kind of random numbers to be duplicated to put in S.

出于创建示例的目的,我将一些统一的[0,1]随机数提高到三次方,并将其称为样本S.通过选择以这种方式生成示例样本,我将预先知道-从均值等于从0到1求出的(x ^ 3)(dx)的确定积分-S的均值应为1/(3+1) = 1/4 = 0.25

For the purposes of creating an example, I am going to raise some uniform [0,1] random numbers to the third power and call that the sample S. By choosing to generate the example sample in this way, I will know in advance -- from the mean being equal to the definite integral of (x^3)(dx) evaluated from 0 to 1 -- that the mean of S should be 1/(3+1) = 1/4 = 0.25


In your application, you would need to do something else instead, perhaps read a file, tocreate a numpy array S containing the data sample whose distribution is to be duplicated.

>>> S = pow(np.random.random(1000),3)  # S will be 1000 samples of a power distribution


Here I will check that the mean of S is 0.25 as stated above.

>>> S.mean()
0.25296623781420458 # OK


get the min and max just to show how np.percentile works

>>> S.min()
>>> S.max()


The numpy.percentile function maps 0-100 to the range of S.

>>> np.percentile(S,0)  # this should match the min of S
6.1091277680105382e-10 # and it does

>>> np.percentile(S,100) # this should match the max of S
0.99608676594692624 # and it does

>>> np.percentile(S,[0,100])  # this should send back an array with both min, max
[6.1091277680105382e-10, 0.99608676594692624]  # and it does

>>> np.percentile(S,np.array([0,100])) # but this doesn't....
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 2803, in percentile
    if q == 0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()


This isn't so great if we generate 100 new values, starting with uniforms:

>>> u = np.random.random(100)


because it will error out, and the scale of u is 0-1, and 0-100 is needed.


>>> newR = np.percentile(S, (100*u).tolist())


which works fine but might need its type adjusted if you want a numpy array back

>>> type(newR)
<type 'list'>

>>> newR = np.array(newR)


Now we have a numpy array. Let's check the mean of the new random values.

>>> newR.mean()
0.25549728059744525 # close enough


07-17 19:27