我可以将丢失的数据显示为Seaborn的额外因素吗?谷歌搜索了一段时间。
这是我正在使用的简单代码:
ax = sns.boxplot(data=df, x=x, y=y)
对于value_counts,有一个dropna之类的选项:
df['bla'].value_counts(dropna = False)
但我找不到盒装图。谢谢。
最佳答案
不,你不能。
至少不是直接与seaborn。
与NaN值有关的问题已在seaborn for lineplot或pairplot中打开。但是,ticket from 2014似乎表明seaborn忽略了从0.4开始的缺失值。可以从seaborn的源代码categorical.py确认
box_data = remove_na(group_data)
我能想到的最好的办法是创建一个额外的分类列,以表示有效/无效的列数据状态。
然后,我将进行2次细分:
-一个counplot,显示您关注的列的有效/无效数据的nb
-基于该列的一些常规海洋情节
另外,可以访问箱形图以访问show the nb of points taken into account for each boxplot。
可以对板凳进行类似的操作。
另一种方法是使用value_count intel并将其添加为annotation
例:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
def custom(val):
if val >= 0.0:
return np.NaN
return val
df = pd.DataFrame(np.random.randn(500, 3))
df = df.rename(index=int, columns={0: 'col_1', 1: 'col_2', 2: 'col_3'})
df['four'] = 'bar'
df['five'] = df['col_1'] > 0
df['category'] = pd.cut(df['col_2'], bins=3, labels=['titi', 'tata', 'toto'])
df['col_3'] = df['col_1'].apply(custom)
df['is_col_3_na'] = pd.isna(df['col_3'])
fig, (ax1, ax2) = plt.subplots(1, 2)
validdf = df[(df['is_col_3_na'] == False)].copy()
sns.countplot(data=df, x='is_col_3_na', ax=ax1).set_title('col_3 valid/invalid data ratios')
sns.boxplot(data=validdf, x='category', y='col_3',
#hue="category",
ax=ax2)
print(df['is_col_3_na'].describe())
print(df['is_col_3_na'].value_counts())
# start: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
# with proper modifications
# Calculate number of obs per group & median to position labels
medians = validdf.groupby(['category'])['col_3'].median().values
nobs = validdf['category'].value_counts().values
nobs = [str(x) for x in nobs.tolist()]
nobs = ["n: " + i for i in nobs]
# Add it to the plot
pos = range(len(nobs))
for tick, label in zip(pos, ax2.get_xticklabels()):
ax2.text(pos[tick], medians[tick] + 0.03, nobs[tick],
horizontalalignment='center', size='x-small', color='b', weight='semibold')
# end: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
plt.show()
输出:
控制台打印(关于“ col_3”列):
count 500
unique 2
top True
freq 254
Name: is_col_3_na, dtype: object
True 254
False 246
Name: is_col_3_na, dtype: int64
关于python - 将NA包括在seaborn boxplot中,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53338630/