本文介绍了 pandas 描述-附加参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!


我看到熊猫库有一个Describe by函数,该函数返回一些有用的统计信息.但是,是否可以将其他行添加到输出中,例如标准偏差(.std)和中位数绝对偏差(.mad)或唯一值的计数?

I see that the pandas library has a Describe by function which returns some useful statistics. However, is there a way to add additional rows to the output such as standard deviation (.std) and median absolute deviation (.mad) or the count of unique values?


I get df.describe() but I'm unable to find out how to add these additional summary things



df = pd.DataFrame(np.random.rand(100, 5), columns=list('ABCDE'))


                A           B           C           D           E
count  100.000000  100.000000  100.000000  100.000000  100.000000
mean     0.495871    0.472939    0.455570    0.503899    0.451341
std      0.303589    0.291968    0.294984    0.269936    0.284666
min      0.006453    0.001559    0.001068    0.015311    0.009526
25%      0.239379    0.219141    0.196251    0.294371    0.202956
50%      0.529596    0.456548    0.376558    0.532002    0.432936
75%      0.759452    0.739666    0.665563    0.730702    0.686793
max      0.999799    0.994510    0.997271    0.981551    0.979221


Updated for pandas 0.20
I'd make my own describe like below. It should be obvious how to add more.

def describe(df, stats):
    d = df.describe()
    return d.append(df.reindex_axis(d.columns, 1).agg(stats))

describe(df, ['skew', 'mad', 'kurt'])

                A           B           C           D           E
count  100.000000  100.000000  100.000000  100.000000  100.000000
mean     0.495871    0.472939    0.455570    0.503899    0.451341
std      0.303589    0.291968    0.294984    0.269936    0.284666
min      0.006453    0.001559    0.001068    0.015311    0.009526
25%      0.239379    0.219141    0.196251    0.294371    0.202956
50%      0.529596    0.456548    0.376558    0.532002    0.432936
75%      0.759452    0.739666    0.665563    0.730702    0.686793
max      0.999799    0.994510    0.997271    0.981551    0.979221
skew    -0.014942    0.048054    0.247244   -0.125151    0.066156
mad      0.267730    0.249968    0.254351    0.228558    0.242874
kurt    -1.323469   -1.223123   -1.095713   -1.083420   -1.148642


def describe(df):
    return pd.concat([df.describe().T,
                     ], axis=1).T


                A           B           C           D           E
count  100.000000  100.000000  100.000000  100.000000  100.000000
mean     0.495871    0.472939    0.455570    0.503899    0.451341
std      0.303589    0.291968    0.294984    0.269936    0.284666
min      0.006453    0.001559    0.001068    0.015311    0.009526
25%      0.239379    0.219141    0.196251    0.294371    0.202956
50%      0.529596    0.456548    0.376558    0.532002    0.432936
75%      0.759452    0.739666    0.665563    0.730702    0.686793
max      0.999799    0.994510    0.997271    0.981551    0.979221
mad      0.267730    0.249968    0.254351    0.228558    0.242874
skew    -0.014942    0.048054    0.247244   -0.125151    0.066156
kurt    -1.323469   -1.223123   -1.095713   -1.083420   -1.148642

这篇关于 pandas 描述-附加参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-24 12:44