问题描述
我正在尝试分析wine-quality
数据集.有两个数据集:red wine
数据集和white wine
.我将它们组合在一起以形成wine_df
.我想把它画出来.我想将红色直方图设为红色,将白色直方图设为白色.但是对于某些直方图,其标签和颜色是不一致的.例如,第四个人的标签为(4,white),而其颜色为红色.我该怎么办?谢谢您的回答!
I'm trying to analyze the wine-quality
dataset. There are two datasets: the red wine
dataset and the white wine
. I combine them together to form the wine_df
. I want to plot it. And I want to give the red histogram red color, the white histogram white color. But for some histogram, its label and its color are inconsistent. For example, the fourth one's label is (4,white), while its color is red. What should I do? Thanks for your answer!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
red_wine = pd.read_csv('https://raw.githubusercontent.com/nishanthgandhidoss/Wine-Quality/master/data/winequality-red.csv',
sep = ';')
white_wine = pd.read_csv('https://raw.githubusercontent.com/nishanthgandhidoss/Wine-Quality/master/data/winequality-white.csv',
sep = ';')
## Add a column to each data to identify the wine color
red_wine['color'] = 'red'
white_wine['color'] = 'white'
## Combine the two dataframes
wine_df = pd.concat([red_wine, white_wine])
colors = ['red','white']
plt.style.use('ggplot')
counts = wine_df.groupby(['quality', 'color']).count()['pH']
counts.plot(kind='bar', title='Counts by Wine Color and quality', color=colors, alpha=.7)
plt.xlabel('Quality and Color', fontsize=18)
plt.ylabel('Count', fontsize=18)
plt.show()
推荐答案
颜色是索引的级别,因此可以使用它来指定颜色.将您的代码行更改为:
The colors are a level of your index, so use that to specify colors. Change your line of code to:
counts.plot(kind='bar', title='Counts by Wine Color and quality',
color=counts.index.get_level_values(1), alpha=.7)
在这种情况下,事实证明matplotlib
可以将索引中的值解释为颜色.通常,您可以将唯一值映射到可识别的颜色,例如:
In this case it just turns out that matplotlib
could interpret the values in your index as colors. In general, you could have mapped the unique values to recognizable colors, for instance:
color = counts.index.get_level_values(1).map({'red': 'green', 'white': 'black'})
pandas
正在按照打印顺序进行操作,但是您始终可以退回到matplotlib
以更可靠地循环显示颜色.这里的窍门是将color
转换为分类变量,因此它总是在groupby
之后表示,从而允许您仅指定列表['red', 'white']
pandas
is doing something with the plotting order, but you could always fall back to matplotlib
to cycle the colors more reliably. The trick here is to convert color
to a categorical variable so it's always represented after the groupby
allowing you to specify only the list ['red', 'white']
import matplotlib.pyplot as plt
wine_df['color'] = wine_df.color.astype('category')
counts = wine_df.groupby(['quality', 'color']).count()['pH'].fillna(0)
ind = np.arange(len(counts))
plt.bar(ind, height=counts.values, color=['red', 'white'])
_ = plt.xticks(ind, counts.index.values, rotation=90)
plt.ylim(0,150) # So we an see (9, white)
plt.show()
这篇关于直方图的颜色及其标签不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!