我有以下代码来绘制有关数据库中主题的一些直方图:
import matplotlib.pyplot as plt
attr_info = {
'Gender': ['m', 'f', 'm', 'm', 'f', 'm', 'm', 'f', 'm', 'f'],
'Age': [9, 43, 234, 23, 2, 95, 32, 63, 58, 42],
'Smoker': ['y', 'n', 'y', 'y', 'n', 'n', 'n', 'n', 'y', 'y']
}
bin_info = {key: None for key in attr_info}
bin_info['Age'] = 10
for name, a_info in attr_info.items():
plt.figure(num=name)
counts, bins, _ = plt.hist(a_info, bins=bin_info[name], color='blue', edgecolor='black')
plt.margins(0)
plt.title(name)
plt.xlabel(name)
plt.ylabel("# Subjects")
plt.yticks(range(0, 11, 2))
plt.grid(axis='y')
plt.tight_layout(pad=0)
plt.show()
该代码有效,但是它在单独的直方图中绘制每个属性的分布。我想要实现的是这样的:
我知道
plt.hist
有一个stacked
参数,但这似乎是为稍有不同的用途而设计的,即您将相同的属性彼此堆叠在不同的主题类型上。例如,您可以绘制一个直方图,其中每个整个条形图都代表某个年龄范围,而条形图本身将是一堆吸烟者使用一种颜色,而不吸烟者使用另一种颜色。我还无法弄清楚如何使用它来堆叠(并正确标记为图像中的)不同属性,每个属性相互叠加。
最佳答案
您需要稍微处理一下数据,但这可以在没有pandas
的情况下完成。另外,您想要的是堆叠的条形图,而不是直方图:
import matplotlib.pyplot as plt
attr_info = {
'Gender': ['m', 'f', 'm', 'm', 'f', 'm', 'm', 'f', 'm', 'f'],
'Age': [9, 43, 234, 23, 2, 95, 32, 63, 58, 42],
'Smoker': ['y', 'n', 'y', 'y', 'n', 'n', 'n', 'n', 'y', 'y']
}
# Filter your data for each bar section that you want
ages_0_10 = [x for x in attr_info['Age'] if x < 10]
ages_10_40 = [x for x in attr_info['Age'] if x >= 10 and x < 40]
ages_40p = [x for x in attr_info['Age'] if x > 40]
gender_m = [x for x in attr_info['Gender'] if 'm' in x]
gender_f = [x for x in attr_info['Gender'] if 'f' in x]
smoker_y = [x for x in attr_info['Smoker'] if 'y' in x]
smoker_n = [x for x in attr_info['Smoker'] if 'n' in x]
# Locations for each bin (you can move them around)
locs = [0, 1, 2]
# I'm going to plot the Ages bin separate than the Smokers and Gender ones,
# since Age has 3 stacked bars and the other have just 2 each
plt.bar(locs[0], len(ages_0_10), width=0.5) # This is the bottom bar
# Second stacked bar, note the bottom variable assigned to the previous bar
plt.bar(locs[0], len(ages_10_40), bottom=len(ages_0_10), width=0.5)
# Same as before but now bottom is the 2 previous bars
plt.bar(locs[0], len(ages_40p), bottom=len(ages_0_10) + len(ages_10_40), width=0.5)
# Add labels, play around with the locations
#plt.text(x, y, text)
plt.text(locs[0], len(ages_0_10) / 2, r'$<10$')
plt.text(locs[0], len(ages_0_10) + 1, r'$[10, 40]$')
plt.text(locs[0], len(ages_0_10) + 5, r'$>40$')
# Define the top bars and bottom bars for the Gender and Smokers stack
# In both cases is just 2 stacked bars,
# so we can use a list for this instead of doing it separate as for Age
tops = [len(gender_m), len(smoker_y)]
bottoms = [len(gender_f), len(smoker_n)]
plt.bar(locs[1:], bottoms, width=0.5)
plt.bar(locs[1:], tops, bottom=bottoms, width=0.5)
# Labels again
# Gender
plt.text(locs[1], len(gender_m) / 2, 'm')
plt.text(locs[1], len(gender_m) + 2, 'f')
# Smokers
plt.text(locs[2], len(smoker_y) / 2, 'y')
plt.text(locs[2], len(smoker_n) + 2, 'n')
# Set tick labels
plt.xticks(locs, ('Age', 'Gender', 'Smoker'))
plt.show()
结果:
检查documentation for pyplot.bar和此example。
关于python - Matplotlib PyPlot堆叠直方图-在每个栏中堆叠不同的属性,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59053514/