python - Seaborn Distplot:数据与概率不匹配

我使用以下代码生成了Seaborn累积distplot：

AlphaGraphCum = sns.distplot(dfControl["alpha"],
             hist_kws={'cumulative': True},
             kde_kws={'cumulative': True}, rug=False, hist=False);
sns.distplot(dfGoal["alpha"],
             hist_kws={'cumulative': True},
             kde_kws={'cumulative': True, 'linestyle':'--'}, rug=False, hist=False);
sns.distplot(dfGraph["alpha"],
             hist_kws={'cumulative': True},
             kde_kws={'cumulative': True, 'linestyle':':'}, rug=False, hist=False);
sns.distplot(dfGoalGraph["alpha"],
             hist_kws={'cumulative': True},
             kde_kws={'cumulative': True, 'linestyle':'-.'}, rug=False, hist=False)


AlphaGraphCum.set(xlabel='Alpha')
AlphaGraphCum.set(ylabel='Cumulative Probability')

#AlphaGraphCum.set_xlim(-1,1)

该图的x轴范围是-2到+2。但是，当我调查数据时，最小值为-1，最大值为+1。因此，我尝试使用以下方法限制轴：

AlphaGraphCum.set_xlim(-1,1)

我在上面的示例中对此进行了注释。然后将x轴正确限制在-1和+1之间。但是，对于x = + 1，没有一行显示y值为1.0，因为+1是最大值，所以应该显示该值，因此累积概率应等于1.0。

有谁知道为什么不是这样？任何提示将不胜感激。谢谢！

最佳答案

Seaborn中的distplot使用kde (Kernel density estimation)为您提供近似的数据集密度，其中假定数据点周围有小的“微核”，并将它们加起来就可以整体创建一个“宏核”。因此，围绕min和max的内核肯定会超过限制，因为边缘（min和max）上的数据点是“微内核”的中心。（注意：“微/宏内核”一词是我在这里为解释而编造的。）

假设我们的数据范围从-10到10，如下所示。

import numpy as np
import pandas as pd

df = pd.DataFrame().assign(a=np.random.randint(-10, 11, 100))
print(df.a.min(), df.a.max())

Out:
-10 10

如果我们使用默认设置（其中distplot是kde）绘制True，

import seaborn as sns
sns.distplot(df.a)

它既显示了绑定在histogram和-10之间的10，也显示了该直方图的近似值kde（当然，由于以上原因）。

python - Seaborn Distplot:数据与概率不匹配-LMLPHP

现在，如果要获取累积密度，则kde会根据min进行如下计算：

sns.distplot(df.a, kde_kws={'cumulative': True})

此时，请注意，第一张图中的max（蓝线）和第二张图中的distplot（蓝线）的两条尾巴都对应。

您可能会怀疑尾部是否完全对应，因为第一和第二张图的y比例不同，因此，如果我们放大第二张图的y轴，则如下图所示。

import matplotlib.pyplot as plt
sns.distplot(df.a, kde_kws={'cumulative': True})
plt.ylim([0, 0.07])

现在，第一张和第三张图看起来相似，但是唯一的区别是第一张是kde，而第三张是kde。

长话短说，您要绘制的是基于cumulative kde的“近似累积密度”。这就是为什么分布（和累积分布）比实际数据（直方图）更广泛的原因。

希望这可以帮助。

编辑：添加kde与cumulative kde

sns.distplot(df.a,
             hist_kws={'cumulative': True},
             kde_kws={'cumulative': True, 'linestyle':'-.'},
             bins=100)

关于python - Seaborn Distplot:数据与概率不匹配，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/52054168/

KDE

python - Seaborn Distplot:数据与概率不匹配