问题描述
我想使用dplyr进行一些数据操作。背景:我有一个调查重量和一堆变量(主要是喜欢项目)。我想将每个类别的频率和百分比与有和没有调查重量相加。
I want to use dplyr for some data manipulation. Background: I have a survey weight and a bunch of variables (mostly likert-items). I want to sum the frequencies and percentages per category with and without survey weight.
举个例子,我们只是用频率来表示性别变量。结果应该是:
As an example, let us just use frequencies for the gender variable. The result should be this:
gender freq freq.weighted
1 292 922.2906
2 279 964.7551
9 6 21.7338
我将为许多变量执行此操作。所以,我决定把dplyr代码放在一个函数中,所以我只需要改变变量并输入更少的值。
I will do this for many variables. So, i decided to put the dplyr-code inside a function, so i only have to change the variable and type less.
#exampledata
gender<-c("2","2","1","2","2","2","2","2","2","2","2","2","1","1","2","2","2","2","2","2","1","2","2","2","2","2","2","2","2","2")
survey_weight<-c("2.368456","2.642901","2.926698","3.628653","3.247463","3.698195","2.776772","2.972387","2.686365","2.441820","3.494899","3.133106","3.253514","3.138839","3.430597","3.769577","3.367952","2.265350","2.686365","3.189538","3.029999","3.024567","2.972387","2.730978","4.074495","2.921552","3.769577","2.730978","3.247463","3.230097")
test_dataframe<-data.frame(gender,survey_weight)
#function
weighting.function<-function(dataframe,variable){
test_weighted<- dataframe %>%
group_by_(variable) %>%
summarise_(interp(freq=count(~weight)),
interp(freq_weighted=sum(~weight)))
return(test_weighted)
}
result_dataframe<-weighting.function(test_dataframe,"gender")
#this second step was left out in this example:
#mutate_(perc=interp(~freq/sum(~freq)*100),perc_weighted=interp(~freq_weighted/sum(~freq_weighted)*100))
导致以下错误消息:
Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "formula"
I尝试了很多不同的事情。首先,我使用 freq = n()
来计算频率,但是我总是收到一个错误(我检查过,plyr是在dplyr之前加载的,而不是之后 - 也没有没有工作。)
I have tried a lot of different things. First, I used freq=n()
to count the frequencies, but I always got an Error (i checked, that plyr was loaded before dplyr and not afterwards - it also didn´t work.).
任何想法?我阅读了关于标准评估的小插曲。但是,我总是遇到问题,不知道可能是什么解决方案。
Any ideas? I read the vignette on standard evaluation. But, i always run into problems and have no idea what could be a solution.
推荐答案
我想你有几个嵌套的错误这是造成你的问题。最大的一个是使用 count()
而不是 summarize()
。我猜你想要 n()
:
I think you have a few nested mistakes which is causing you problems. The biggest one is using count()
instead summarise()
. I'm guessing you wanted n()
:
weighting.function <- function(dataframe, variable){
dataframe %>%
group_by_(variable) %>%
summarise_(
freq = ~n(),
freq_weighted = ~sum(survey_weight)
)
}
weighting.function(test_dataframe, ~gender)
你还有一些不必要的使用 interp()
。如果您使用 interp()
,则该调用应该看起来像 freq = interp(〜n())
,即该名称不在外部调用中,而插入的东西从〜
开始。
You also had a few unneeded uses of interp()
. If you do use interp()
, the call should look like freq = interp(~n())
, i.e. the name is outside the call to interp, and the thing being interpolated starts with ~
.
这篇关于在函数中使用dplyr的问题(group_by)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!