问题描述
我已经编写了如python中的PROC SUMMARY打印输出功能,但需要一些改进输出的帮助:
这是代码:
def wmean_grouped2(group,var_name_in,var_name_weight):
d = group [var_name_in]
w = group [var_name_weight]
return(d * w).sum()/ w.sum()
FUNCS = {意思是:np.mean,
总和:np.sum,
count:np.count_nonzero}
def my_summary2(
数据,
var_names_in,
var_names_out,
var_functions,
var_name_weight =无,
var_names_group =无):
result = pd.DataFrame()
如果var_names_group为None:
分组= data.groupby(lambda x:True)
else:
分组= dat a.groupby(var_names_group)
for var_name_in,var_name_out,var_function in \
zip(var_names_in,var_names_out,var_functions):
如果var_function ==wsum:
func = lambda x:wmean_grouped2(x,var_name_in,var_name_weight)
result [var_name_out] = pd。系列(groups.apply(func))
else:
func = FUNCS [var_function]
结果[var_name_out] =分组[var_name_in] .apply(func)
result.loc ['Total'] = result.select_dtypes(pd.np.number).sum ()
返回结果
以下是调用上述功能的代码:
int(my_summary2(
data = df,
var_names_in = [sal,sal],
var_names_out = [
COUNT,SAL
],
var_fun ctions = [count,sum],
var_name_weight =val_1,
var_names_group = ['name','age']
))
这里我在两个栏目name和age上分组,下面是我得到的输出:
COUNT SAL
(Arik,32)1 100
(David,44)2 260
(John,33)1 200
(John,34)1 300
(Peter,33)1 100
总计6 960
在输出中,两个分组列都在括号中打印,没有名称和年龄的列标题。我想得到以下输出:
名字年龄COUNT SAL
--------------- -------
Arik 32 1 100
David 44 2 260
John 33 1 200
约翰34 1 300
彼得33 1 100
----------------------
总计6 960
我尝试过:
我在上面的部分已经提到过我尝试过的以及我需要的东西。
I have written function to print output like PROC SUMMARY in python, but needs some help in improvement on output:
Here is the code :
def wmean_grouped2 (group, var_name_in, var_name_weight):
d = group[var_name_in]
w = group[var_name_weight]
return (d * w).sum() / w.sum()
FUNCS = { "mean" : np.mean ,
"sum" : np.sum ,
"count" : np.count_nonzero }
def my_summary2 (
data ,
var_names_in ,
var_names_out ,
var_functions ,
var_name_weight = None ,
var_names_group = None ):
result = pd.DataFrame()
if var_names_group is None:
grouped = data.groupby (lambda x: True)
else:
grouped = data.groupby (var_names_group)
for var_name_in, var_name_out, var_function in \
zip(var_names_in,var_names_out,var_functions):
if var_function == "wsum":
func = lambda x : wmean_grouped2 (x, var_name_in, var_name_weight)
result[var_name_out] = pd.Series(grouped.apply(func))
else:
func = FUNCS[var_function]
result[var_name_out] = grouped[var_name_in].apply(func)
result.loc['Total'] = result.select_dtypes(pd.np.number).sum()
return result
And below is the code to call above function:
int(my_summary2 (
data=df,
var_names_in=["sal","sal"] ,
var_names_out=[
"COUNT","SAL"
] ,
var_functions=["count","sum"] ,
var_name_weight="val_1" ,
var_names_group=['name','age']
))
Here I am grouping on two columns "name" and "age" and below is the output I am getting :
COUNT SAL
(Arik, 32) 1 100
(David, 44) 2 260
(John, 33) 1 200
(John, 34) 1 300
(Peter, 33) 1 100
Total 6 960
In the output, both grouped columns are printing in brackets without column header of name and age. I want to get below output:
name age COUNT SAL
----------------------
Arik 32 1 100
David 44 2 260
John 33 1 200
John 34 1 300
Peter 33 1 100
----------------------
Total 6 960
What I have tried:
I have mentioned in above section what I have tried and what I required.
这篇关于如何改进分组列的Python函数输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!