python - 将条变为正态分布

我是python的新手。

我有2个数组和一个漂亮的条形图：

# Buyers in %
h =[1,1,3,5,9,13,16,16,14,10,5,4,2,1,0]

# Clothes size
x =  [34,35,36,37,38,39,40,41,42,43,44,45,46,47,48]

# P(X=40) =  16 % // The probability that some buyers gets a 40 sized clothe is 16 %
# P(37 <= X <= 40)  = 5+9+13+16 = 43 % // The probability that somes buyers gets between 37 and 40 sized clothes is 43 %

plt.ylabel('Buyers % ')
plt.xlabel('Clothes Size')
plt.bar(x, height = h)
plt.grid(True)
plt.show()

如何使用seaborn或scipy.stats.norm将其转换为密度线和正态分布，并在条形图上绘制？
之后，如何使用正态分布计算P（X

谢谢。

最佳答案

使用seaborn：

# Buyers in %
h =[1,1,3,5,9,13,16,16,14,10,5,4,2,1,0]

# Clothes size
x =  [34,35,36,37,38,39,40,41,42,43,44,45,46,47,48]
import seaborn as sns
from scipy.stats import norm
data = []
for i in range(len(x)): data += [x[i]]*h[i]
sns.set()
plt.figure(figsize=(10,5),dpi=300)
sns.distplot(data, fit=norm, kde=False)

获取概率：

from scipy.stats import norm
import numpy as np
sample = data
sample_mean = np.array(data).mean()
sample_std = np.array(data).std()
min_value = int(sample_mean-4*sample_std)
max_value = int(sample_mean+4*sample_std)
dist = norm(sample_mean, sample_std)
values = [value for value in range(min_value, max_value)]
probabilities = [dist.pdf(value) for value in values]

#plt.plot(values,probabilities)

def prob(min_lim,max_lim):
    p = (np.array(values)>min_lim).astype(int)* (np.array(values)<max_lim).astype(int)
    prob = (np.array(probabilities)[p.astype(bool)]).sum()
    return prob

prob(0,40)

Out[2]: 0.3230891372830226

注意：它与计算值不同，因为它使用的是根据数据均值和标准差的连续估计正态分布。

如果您不想使用连续估计，则代码如下：

len(np.array(data)[np.array(data)<40])/len(data)
Out[2]: 0.32

关于python - 将条变为正态分布，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/59582648/