如何使用逆 CDF 在 Python 中随机采样对数正态数据并指定目标百分位数?

其次，我会以不同的方式解决问题.您需要 m 和 s 从中采样对数正态.如果您查看上面的 wiki 文章，您会发现它是双参数分布.你正好有两个条件:Mode = exp(m - s*s) = 320[100,1000] 中的 80% 样本 =>CDF(1000,m,s) - CDF(100,m,s) = 0.8其中 CDF 通过误差函数表示(这是在任何库中都可以找到的非常常见的函数)两个参数的两个非线性方程.解决它们，找到m 和s 并将其放入任何标准对数正态采样I'm trying to generate random samples from a lognormal distribution in Python, the application is for simulating network traffic. I'd like to generate samples such that:The modal sample result is 320 (~10^2.5)80% of the samples lie within the range 100 to 1000 (10^2 to 10^3)My strategy is to use the inverse CDF (or Smirnov transform I believe):Use the PDF for a normal distribution centred around 2.5 to calculate the PDF for 10^x where x ~ N(2.5,sigma).Calculate the CDF for the above distribution.Generate random uniform data along the interval 0 to 1.Use the inverse CDF to transform the random uniform data into the required range.The problem is, when I calculate the 10 and 90th percentile at the end, I have completely the wrong numbers.Here is my code:%matplotlib inlineimport matplotlibimport pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport scipy.statsfrom scipy.stats import norm# find value of mu and sigma so that 80% of data lies within range 2 to 3mu=2.505sigma = 1/2.505norm.ppf(0.1, loc=mu,scale=sigma),norm.ppf(0.9, loc=mu,scale=sigma)# output: (1.9934025, 3.01659743)# Generate normal distribution PDFx = np.arange(16,128000, 16) # linearly spaced here, with extra range so that CDF is correctly scaledx_log = np.log10(x)mu=2.505sigma = 1/2.505y = norm.pdf(x_log,loc=mu,scale=sigma)fig, ax = plt.subplots()ax.plot(x_log, y, 'r-', lw=5, alpha=0.6, label='norm pdf')x2 = (10**x_log) # x2 should be linearly spaced, so that cumsum works (later)fig, ax = plt.subplots()ax.plot(x2, y, 'r-', lw=5, alpha=0.6, label='norm pdf')ax.set_xlim(0,2000)# Calculate CDFy_CDF = np.cumsum(y) / np.cumsum(y).max()fig, ax = plt.subplots()ax.plot(x2, y_CDF, 'r-', lw=2, alpha=0.6, label='norm pdf')ax.set_xlim(0,8000)# Generate random uniform datainput = np.random.uniform(size=10000)# Use CDF as lookup tabletraffic = x2[np.abs(np.subtract.outer(y_CDF, input)).argmin(0)]# Discard highs and lowstraffic = traffic[(traffic >= 32) & (traffic <= 8000)]# Check percentilesnp.percentile(traffic,10),np.percentile(traffic,90)Which produces the output:(223.99999999999997, 2480.0000000000009)... and not the (100, 1000) that I would like to see. Any advice appreciated! 解决方案 First, I'm not sure about Use the PDF for a normal distribution centred around 2.5. After all, log-normal is about base e logarithm (aka natural log), which means 320 = 10 = e.Second, I would approach problem in a different way. You need m and s to sample from Log-Normal.If you look at wiki article above, you could see that it is two-parametric distribution. And you have exactly two conditions:Mode = exp(m - s*s) = 32080% samples in [100,1000] => CDF(1000,m,s) - CDF(100,m,s) = 0.8where CDF is expressed via error function (which is pretty much common function found in any library)So two non-linear equations for two parameters. Solve them, find m and s and put it into any standard log-normal sampling 这篇关于如何使用逆 CDF 在 Python 中随机采样对数正态数据并指定目标百分位数?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！

Uniform

如何使用逆 CDF 在 Python 中随机采样对数正态数据并指定目标百分位数?