问题描述
我想比较一组分数分布(分数
),按某些类别(中心度
)分组),并用其他颜色(模型
)着色。我已经尝试过使用seaborn进行以下操作:
I would like to compare a set of distributions of scores (score
), grouped by some categories (centrality
) and colored by some other (model
). I've tried the following with seaborn:
plt.figure(figsize=(14,6))
seaborn.boxplot(x="centrality", y="score", hue="model", data=data, palette=seaborn.color_palette("husl", len(models) +1))
seaborn.despine(offset=10, trim=True)
plt.savefig("/home/i11/staudt/Eval/properties-replication-test.pdf", bbox_inches="tight")
此图存在一些问题:
- 有大量异常值,我不喜欢它们在此处的绘制方式。我可以删除它们吗?我可以更改外观以减少混乱吗?我可以至少给它们上色以使其颜色与方框颜色匹配吗?
-
模型
值原始
是特殊的,因为所有其他分布都应与原始
的分布进行比较。这应该在图中直观地反映出来。我可以将原始
作为每个组的第一盒吗?我可以以某种方式抵消或标记它吗?是否可以通过每个原始
分布的中位数并通过一组框画一条水平线? - 其中一些
分数
的值很小,如何对y轴进行适当缩放以显示它们?
- There is a large amount of outliers and I don't like how they are drawn here. Can I remove them? Can I change the appearance to show less clutter? Can I color them at least so that their color matches the box color?
- The
model
valueoriginal
is special because all other distributions should be compared to the distribution oforiginal
. This should be visually reflected in the plot. Can I makeoriginal
the first box of every group? Can I offset or mark it differently somehow? Would it be possible to draw a horizontal line through the median of eachoriginal
distribution and through the group of boxes? - some of the values of
score
are very small, how to do proper scaling of the y-axis to show them?
编辑:
这是一个具有对数比例的y轴的示例-也不理想。为什么某些盒子似乎在低端被切断?
Here is an example with a log-scaled y-axis - also not yet ideal. Why do the some boxes seem cut off at the low end?
推荐答案
异常值显示
您应该可以将任何参数传递给 seaborn.boxplot
,您可以将其传递给 plt.boxplot
(请参阅),因此您可以通过设置 flierprops
来调整异常值的显示。 是一些可以处理异常值的示例。
You should be able to pass any arguments to seaborn.boxplot
that you can pass to plt.boxplot
(see documentation), so you could adjust the display of the outliers by setting flierprops
. Here are some examples of what you can do with your outliers.
如果不想显示它们,则可以
If you don't want to display them, you could do
seaborn.boxplot(x="centrality", y="score", hue="model", data=data,
showfliers=False)
,也可以像这样使它们变成浅灰色:
or you could make them light gray like so:
flierprops = dict(markerfacecolor='0.75', markersize=5,
linestyle='none')
seaborn.boxplot(x="centrality", y="score", hue="model", data=data,
flierprops=flierprops)
组的顺序
您可以使用 hue_order
手动设置组的顺序,例如
You can set the order of the groups manually with hue_order
, e.g.
seaborn.boxplot(x="centrality", y="score", hue="model", data=data,
hue_order=["original", "Havel..","etc"])
y轴缩放
您可以获取最小值和最大值所有y值的值并相应地设置 y_lim
?像这样的东西:
You could just get the minimum and maximum values of all y-values and set y_lim
accordingly? Something like this:
y_values = data["scores"].values
seaborn.boxplot(x="centrality", y="score", hue="model", data=data,
y_lim=(np.min(y_values),np.max(y_values)))
编辑:这最后一点没有任何意义,因为自动 y_lim
范围已经存在包括所有值,但我仅作为调整这些设置的示例。如评论中所述,日志扩展可能更有意义。
This last point doesn't really make sense since the automatic y_lim
range will already include all the values, but I'm leaving it just as an example of how to adjust these settings. As mentioned in the comments, log-scaling probably makes more sense.
这篇关于调整seaborn.boxplot的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!