我可以将丢失的数据显示为Seaborn的额外因素吗?谷歌搜索了一段时间。

这是我正在使用的简单代码:

ax = sns.boxplot(data=df, x=x, y=y)


对于value_counts,有一个dropna之类的选项:

df['bla'].value_counts(dropna = False)


但我找不到盒装图。谢谢。

最佳答案

不,你不能。
至少不是直接与seaborn。

与NaN值有关的问题已在seaborn for lineplotpairplot中打开。但是,ticket from 2014似乎表明seaborn忽略了从0.4开始的缺失值。可以从seaborn的源代码categorical.py确认

box_data = remove_na(group_data)


我能想到的最好的办法是创建一个额外的分类列,以表示有效/无效的列数据状态。

然后,我将进行2次细分:
 -一个counplot,显示您关注的列的有效/无效数据的nb
 -基于该列的一些常规海洋情节

另外,可以访问箱形图以访问show the nb of points taken into account for each boxplot
可以对板凳进行类似的操作。

另一种方法是使用value_count intel并将其添加为annotation

例:

import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

def custom(val):
    if val >= 0.0:
        return np.NaN
    return val

df = pd.DataFrame(np.random.randn(500, 3))
df = df.rename(index=int, columns={0: 'col_1', 1: 'col_2', 2: 'col_3'})
df['four'] = 'bar'
df['five'] = df['col_1'] > 0
df['category'] = pd.cut(df['col_2'], bins=3, labels=['titi', 'tata', 'toto'])
df['col_3'] = df['col_1'].apply(custom)
df['is_col_3_na'] = pd.isna(df['col_3'])

fig, (ax1, ax2) = plt.subplots(1, 2)
validdf = df[(df['is_col_3_na'] == False)].copy()

sns.countplot(data=df, x='is_col_3_na', ax=ax1).set_title('col_3 valid/invalid data ratios')
sns.boxplot(data=validdf, x='category', y='col_3',
            #hue="category",
            ax=ax2)

print(df['is_col_3_na'].describe())
print(df['is_col_3_na'].value_counts())

# start: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
# with proper modifications
# Calculate number of obs per group & median to position labels
medians = validdf.groupby(['category'])['col_3'].median().values
nobs = validdf['category'].value_counts().values
nobs = [str(x) for x in nobs.tolist()]
nobs = ["n: " + i for i in nobs]

# Add it to the plot
pos = range(len(nobs))
for tick, label in zip(pos, ax2.get_xticklabels()):
    ax2.text(pos[tick], medians[tick] + 0.03, nobs[tick],
             horizontalalignment='center', size='x-small', color='b', weight='semibold')
# end: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
plt.show()


输出:

python - 将NA包括在seaborn boxplot中-LMLPHP

控制台打印(关于“ col_3”列):

count      500
unique       2
top       True
freq       254
Name: is_col_3_na, dtype: object

True     254
False    246
Name: is_col_3_na, dtype: int64

关于python - 将NA包括在seaborn boxplot中,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53338630/

10-11 06:19
查看更多