问题描述
在 R 中,我可以通过执行以下操作来创建所需的输出:
In R I can create the desired output by doing:
data = c(rep(1.5, 7), rep(2.5, 2), rep(3.5, 8),
rep(4.5, 3), rep(5.5, 1), rep(6.5, 8))
plot(density(data, bw=0.5))
在 python(使用 matplotlib)中,我得到的最接近的是一个简单的直方图:
In python (with matplotlib) the closest I got was with a simple histogram:
import matplotlib.pyplot as plt
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
plt.hist(data, bins=6)
plt.show()
我也尝试了 normed=True 参数但什么也得不到除了尝试将高斯拟合到直方图之外.
I also tried the normed=True parameter but couldn't get anything other than trying to fit a gaussian to the histogram.
我最近的尝试是围绕 scipy.stats
和 gaussian_kde
,以下是网络上的示例,但到目前为止我没有成功.
My latest attempts were around scipy.stats
and gaussian_kde
, following examples on the web, but I've been unsuccessful so far.
推荐答案
Sven 已经展示了如何使用 Scipy 中的 gaussian_kde
类,但是您会注意到它看起来与您的不太一样用 R 生成.这是因为 gaussian_kde
尝试自动推断带宽.您可以通过更改gaussian_kde
类的函数covariance_factor
以某种方式玩弄带宽.首先,这是您在不更改该函数的情况下获得的结果:
Sven has shown how to use the class gaussian_kde
from Scipy, but you will notice that it doesn't look quite like what you generated with R. This is because gaussian_kde
tries to infer the bandwidth automatically. You can play with the bandwidth in a way by changing the function covariance_factor
of the gaussian_kde
class. First, here is what you get without changing that function:
但是,如果我使用以下代码:
However, if I use the following code:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
density = gaussian_kde(data)
xs = np.linspace(0,8,200)
density.covariance_factor = lambda : .25
density._compute_covariance()
plt.plot(xs,density(xs))
plt.show()
我明白
这与你从 R 得到的非常接近.我做了什么?gaussian_kde
使用可变函数 covariance_factor
来计算其带宽.在更改函数之前,covariance_factor 为该数据返回的值约为 0.5.降低这会降低带宽.我不得不在更改该函数后调用 _compute_covariance
以便正确计算所有因子.它与 R 中的 bw 参数并不完全对应,但希望它可以帮助您找到正确的方向.
which is pretty close to what you are getting from R. What have I done? gaussian_kde
uses a changable function, covariance_factor
to calculate its bandwidth. Before changing the function, the value returned by covariance_factor for this data was about .5. Lowering this lowered the bandwidth. I had to call _compute_covariance
after changing that function so that all of the factors would be calculated correctly. It isn't an exact correspondence with the bw parameter from R, but hopefully it helps you get in the right direction.
这篇关于如何在 matplotlib 中创建密度图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!