我有一个2D数据集,我想绘制一个2D直方图,直方图上的每个单元格代表数据点的概率。因此,为了获得概率,我需要对直方图数据进行归一化,以使其总和为1。这是我从2Dhistogram文档中获得的示例:
xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
#create edges of bins
#create random data points
x=np.random.normal(2,1,100)
y=np.random.normal(1,1,100)
H,xedges,yedges = np.histogram2d(x,y,bins=(xedges,yedges))
#setting normed=True in histogram2d doesn't seem to do what I need
H=H.T
#weirdly histogram2d swaps the x,y axis, so transpose to restore it.
fig = plt.figure(figsize=(7,3))
plt.imshow(H,interpolation='nearest',origin='low',extent=[xedges[0], xedges[-1],yedges[0],yedges[-1]])
plt.show()
Resulting plot
首先,
np.sum(H)
给出类似于86的值。我希望每个单元格代表位于该合并单元格上的数据的概率,因此它们都应求和为1。此外,如何绘制图例以将颜色强度映射到它的值与imshow
吗?谢谢!
最佳答案
尝试使用normed
参数。同样,根据docs,H中的值将计算为bin_count / sample_count / bin_area
。因此,我们计算垃圾箱的面积,并将其乘以H,即可得出垃圾箱的概率。
xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
# create edges of bins
x = np.random.normal(2, 1, 100) # create random data points
y = np.random.normal(1, 1, 100)
H, xedges, yedges = np.histogram2d(x, y, bins=(xedges, yedges), normed=True)
areas = np.matmul(np.array([np.diff(xedges)]).T, np.array([np.diff(yedges)]))
# setting normed=True in histogram2d doesn't seem to do what I need
fig = plt.figure(figsize=(7, 3))
im = plt.imshow(H*areas, interpolation='nearest', origin='low', extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
plt.colorbar(im)
plt.show()
关于python - 二维直方图针对概率进行了归一化,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50939778/