问题描述
尝试从 SciPy 的分布中抽取一个随机数,就像使用 stats.norm.rvs 一样.但是,我正在尝试从我拥有的经验分布中获取数字 - 它是一个偏斜的数据集,我想将偏斜和峰度合并到我正在绘制的分布中.理想情况下,我只想调用 stats.norm.rvs(loc=blah,scale=blah,size=blah) 然后除了均值和方差之外还设置偏斜和 kurt.norm 函数采用一个时刻"参数,其中包含一些mvsk"排列,其中 s 和 k 代表偏斜和峰态,但显然所做的只是要求从 rv 计算 s 和 k,而我想要建立 s 和 k 作为分布的参数.
trying to draw a random number from a distribution in SciPy, just like you would with stats.norm.rvs. However, I'm trying to take the number from an empirical distribution I have - it's a skewed dataset and I want to incorporate the skew and kurtosis into the distribution that I'm drawing from. Ideally I'd like to just call stats.norm.rvs(loc=blah,scale=blah,size=blah) and then also set the skew and kurt in addition to the mean and variance. The norm function takes a 'moments' argument consisting of some arrangement of 'mvsk' where the s and k stand for skew and kurtosis, but apparently all that does is ask that the s and k be computed from the rv, whereas I want to establish the s and k as parameters of the distribution to begin with.
无论如何,我绝不是统计专家,也许这是一个简单或误导的问题.将不胜感激任何帮助.
Anyway, I'm not a statistics expert by any means, perhaps this is a simple or misguided question. Would appreciate any help.
如果四个矩不足以很好地定义分布,是否还有其他方法可以绘制与如下所示的经验分布一致的值:http://i.imgur.com/3yB2Y.png
If the four moments aren't enough to define the distribution well enough, is there any other way to draw values that are consist with an empirical distribution that looks like this: http://i.imgur.com/3yB2Y.png
推荐答案
如果您不担心进入发行版的尾部,并且数据是浮点数,那么您可以从经验分布中取样.
If you are not worried about getting out into the tails of the distribution,and the data are floating point, thenyou can sample from the empirical distribution.
- 对数据进行排序.
- 在数据前添加 0.
- 让 N 表示这个 data_array 的长度
- 计算 q=scipy.rand()*N
- idx=int(q);di=q-idx
- xlo=data_array[idx], xhi=data_array[idx+1];
- 返回 xlo+(xhi-xlo)*di
基本上,这是在经验 CDF 中线性插值以获得随机变量.
Basically, this is linearly interpolating in the empirical CDF to obtainthe random variates.
两个潜在的问题是(1)如果你的数据集很小,你可能无法代表分布良好,并且(2)您不会生成大于最大的值现有数据集中的一个.
The two potential problems are (1) if your data set is small, you may not represent thedistribution well, and (2) you will not generate a value larger than the largestone in your existing data set.
要超越这些,您需要查看参数分布,例如上面提到的 Gamma 分布.
To get beyond those you need to look at parametric distributions, like the gamma distribution mentioned above.
这篇关于来自 scipy 的偏态分布的随机变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!