本文介绍了如何改进分组列的Python函数输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经编写了如python中的PROC SUMMARY打印输出功能,但需要一些改进输出的帮助:



这是代码:





def wmean_grouped2(group,var_name_in,var_name_weight):

d = group [var_name_in]

w = group [var_name_weight]

return(d * w).sum()/ w.sum()



FUNCS = {意思是:np.mean,

总和:np.sum,

count:np.count_nonzero}



def my_summary2(

数据,

var_names_in,

var_names_out,

var_functions,

var_name_weight =无,

var_names_group =无):



result = pd.DataFrame()



如果var_names_group为None:

分组= data.groupby(lambda x:True)

else:

分组= dat a.groupby(var_names_group)



for var_name_in,var_name_out,var_function in \

zip(var_names_in,var_names_out,var_functions):

如果var_function ==wsum:

func = lambda x:wmean_grouped2(x,var_name_in,var_name_weight)

result [var_name_out] = pd。系列(groups.apply(func))

else:

func = FUNCS [var_function]

结果[var_name_out] =分组[var_name_in] .apply(func)





result.loc ['Total'] = result.select_dtypes(pd.np.number).sum ()



返回结果



以下是调用上述功能的代码:



int(my_summary2(

data = df,

var_names_in = [sal,sal],

var_names_out = [

COUNT,SAL

],

var_fun ctions = [count,sum],

var_name_weight =val_1,

var_names_group = ['name','age']

))



这里我在两个栏目name和age上分组,下面是我得到的输出:



COUNT SAL

(Arik,32)1 100

(David,44)2 260

(John,33)1 200

(John,34)1 300

(Peter,33)1 100

总计6 960



在输出中,两个分组列都在括号中打印,没有名称和年龄的列标题。我想得到以下输出:



名字年龄COUNT SAL

--------------- -------

Arik 32 1 100

David 44 2 260

John 33 1 200

约翰34 1 300

彼得33 1 100

----------------------

总计6 960



我尝试过:



我在上面的部分已经提到过我尝试过的以及我需要的东西。

解决方案

I have written function to print output like PROC SUMMARY in python, but needs some help in improvement on output:

Here is the code :


def wmean_grouped2 (group, var_name_in, var_name_weight):
d = group[var_name_in]
w = group[var_name_weight]
return (d * w).sum() / w.sum()

FUNCS = { "mean" : np.mean ,
"sum" : np.sum ,
"count" : np.count_nonzero }

def my_summary2 (
data ,
var_names_in ,
var_names_out ,
var_functions ,
var_name_weight = None ,
var_names_group = None ):

result = pd.DataFrame()

if var_names_group is None:
grouped = data.groupby (lambda x: True)
else:
grouped = data.groupby (var_names_group)

for var_name_in, var_name_out, var_function in \
zip(var_names_in,var_names_out,var_functions):
if var_function == "wsum":
func = lambda x : wmean_grouped2 (x, var_name_in, var_name_weight)
result[var_name_out] = pd.Series(grouped.apply(func))
else:
func = FUNCS[var_function]
result[var_name_out] = grouped[var_name_in].apply(func)


result.loc['Total'] = result.select_dtypes(pd.np.number).sum()

return result

And below is the code to call above function:

int(my_summary2 (
data=df,
var_names_in=["sal","sal"] ,
var_names_out=[
"COUNT","SAL"
] ,
var_functions=["count","sum"] ,
var_name_weight="val_1" ,
var_names_group=['name','age']
))

Here I am grouping on two columns "name" and "age" and below is the output I am getting :

COUNT SAL
(Arik, 32) 1 100
(David, 44) 2 260
(John, 33) 1 200
(John, 34) 1 300
(Peter, 33) 1 100
Total 6 960

In the output, both grouped columns are printing in brackets without column header of name and age. I want to get below output:

name age COUNT SAL
----------------------
Arik 32 1 100
David 44 2 260
John 33 1 200
John 34 1 300
Peter 33 1 100
----------------------
Total 6 960

What I have tried:

I have mentioned in above section what I have tried and what I required.

解决方案


这篇关于如何改进分组列的Python函数输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-01 16:01