本文介绍了在自定义dplyr函数中更改结果变量的名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 背景 为了加快跨多个表格生成分组摘要,因为我在 dplyr 工作流程,我已经草拟了一个简单的函数来生成所需的度量标准 #函数生成汇总表 generate_summary_tbl group_column summary_column dataset%>>%其他指标需要其他指标,例如:group_by(!! group_column)%>% summary( mean = mean(!! summary_column), sum = sum(!! summary_column)#经常产生)%>% ungroup - > smryDta return(smryDta)} 示例 该功能可以根据需要运行: >> mtcars%>% ... generate_summary_tbl(group_column = am,summary_column = mpg)#一个tibble:2 x 3 平均值总和< dbl> < DBL> < DBL> 1 0 17.14737 325.8 2 1 24.39231 317.1 问题 我希望 有条件在结果中包含通过 summary_column = mpg 传递的列的名称。 结果示例 useColName = TRUE 当使用 useColName = TRUE 调用时,结果应该对应于: >> mtcars%>% ... generate_summary_tbl(group_column = am,summary_column = mpg, useColName = TRUE)#一个tibble:2 x 3 am mean_am sum_am < dbl> < DBL> < DBL> 1 0 17.14737 325.8 2 1 24.39231 317.1 区别在于变量名 mean_am 等后缀中的 _am 后缀 丑陋的解决方案 部分的,丑陋的解决方案我使用 setNames code $ c $ b $ $ p $ $ $ $ $ $ $ $ $ $ $ $ $ $数据集, group_column, summary_column, useColName = TRUE){ group_column< - enquo(group_column) summary_column< - enquo(summary_column)数据集%>% group_by(!! group_column)%>% summary(mean = mean(!! summary_column), sum = sum(!! summary_column))%> ;%取消组合 - > smryDta if(useColName){ setNames(smryDta,c(deparse(substitute( group_column )), paste(名称(smryDta)[2:长度(smryDta)],paste0(_,deparse(替代( group_column )))))) - > smryDta } return(smryDta)} 示例 返回的列名几乎匹配所需的结果。我想我可以使用一些正则表达式并达到预期的结果。然而,我认为应该提供更高效的解决方案。 mtcars%>% generate_summary_tbl(group_column = am ,summary_column = mpg,useColName = TRUE)#A tibble:2 x 3 `〜am`` mean _〜am`` sum _〜am` < DBL> < DBL> 1 0 17.14737 325.8 2 1 24.39231 317.1 我希望获得所需的列名,理想情况下可以更好地使用 quo 或 lazyeval ? 解决方案也许使用 rename : library(tidyverse) generate_summary_tbl< - function(dataset ,group_column,summary_column,useColname = FALSE){ group_column summary_column< - enquo(summary_column) dataset%>% group_by(! ($汇总列), sum = sum(!! summary_column)#其他需要属的指标特别频繁的)%>% ungroup - > smryDta if(useColname) smryDta< - smryDta%>% rename_at( vars(-one_of(quo_name(group_column))),〜paste(quo_name(group_column),.x,sep =_)) return(smryDta)} mtcars %>%generate_summary_tbl(am,mpg)##一个tibble:2 x 3 #均值和#< dbl> < DBL> < DBL> #1 0 17.14737 325.8 #2 1 24.39231 317.1 mtcars%>%generate_summary_tbl(am,mpg,T)##一个tibble:2 x 3 #am_mean am_sum #< dbl> < DBL> < DBL> #1 0 17.14737 325.8 #2 1 24.39231 317.1 BackgroundIn order to speed up generating grouped summaries across multiple tables; as I'm doing most of that while in dplyr workflow, I've drafted a simple function that generates the desired metrics# Function to generate summary tablegenerate_summary_tbl <- function(dataset, group_column, summary_column) { group_column <- enquo(group_column) summary_column <- enquo(summary_column) dataset %>% group_by(!!group_column) %>% summarise( mean = mean(!!summary_column), sum = sum(!!summary_column) # Other metrics that need to be generated frequently ) %>% ungroup -> smryDta return(smryDta)}ExampleThe function works as desired:>> mtcars %>% ... generate_summary_tbl(group_column = am, summary_column = mpg)# A tibble: 2 x 3 am mean sum <dbl> <dbl> <dbl>1 0 17.14737 325.82 1 24.39231 317.1ProblemI would like, conditionally include name of the column passed via summary_column = mpg in the results.Example results, useColName = TRUEWhen called with useColName = TRUE the results should correspond to:>> mtcars %>% ... generate_summary_tbl(group_column = am, summary_column = mpg, useColName = TRUE)# A tibble: 2 x 3 am mean_am sum_am <dbl> <dbl> <dbl>1 0 17.14737 325.82 1 24.39231 317.1The difference is presence of the _am suffix in the variable names mean_am and so on.Ugly solutionPartial, ugly solution I have uses setNames:# Function to generate summary tablegenerate_summary_tbl <- function(dataset, group_column, summary_column, useColName = TRUE) { group_column <- enquo(group_column) summary_column <- enquo(summary_column) dataset %>% group_by(!!group_column) %>% summarise(mean = mean(!!summary_column), sum = sum(!!summary_column)) %>% ungroup -> smryDta if (useColName) { setNames(smryDta, c(deparse(substitute( group_column )), paste( names(smryDta)[2:length(smryDta)], paste0("_", deparse(substitute( group_column ))) ))) -> smryDta } return(smryDta) }ExampleThe returned column names, almost match the desired results. I reckon I could employ some regex and arrive at the desired results. However, I reckon that more efficient solutions should be available.mtcars %>% generate_summary_tbl(group_column = am, summary_column = mpg, useColName = TRUE)# A tibble: 2 x 3 `~am` `mean _~am` `sum _~am` <dbl> <dbl> <dbl>1 0 17.14737 325.82 1 24.39231 317.1How can I get desired column names, ideally making better use of quo or lazyeval? 解决方案 Maybe use rename: library(tidyverse)generate_summary_tbl <- function(dataset, group_column, summary_column, useColname = FALSE) { group_column <- enquo(group_column) summary_column <- enquo(summary_column) dataset %>% group_by(!!group_column) %>% summarise( mean = mean(!!summary_column), sum = sum(!!summary_column) # Other metrics that need to be generated frequently ) %>% ungroup -> smryDta if (useColname) smryDta <- smryDta %>% rename_at( vars(-one_of(quo_name(group_column))), ~paste(quo_name(group_column), .x, sep="_") ) return(smryDta)}mtcars %>% generate_summary_tbl(am, mpg)# # A tibble: 2 x 3# am mean sum# <dbl> <dbl> <dbl># 1 0 17.14737 325.8# 2 1 24.39231 317.1mtcars %>% generate_summary_tbl(am, mpg, T)# # A tibble: 2 x 3# am am_mean am_sum# <dbl> <dbl> <dbl># 1 0 17.14737 325.8# 2 1 24.39231 317.1 这篇关于在自定义dplyr函数中更改结果变量的名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
09-22 17:18