本文介绍了折叠数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何折叠我的数据框,其中许多观测值包含多行,但几个不同变量中的每个变量最多只有一个值?
How can I collapse my data frame where many observations have multiple rows but at most only one value for each of several different variables?
这就是我所拥有的:
id title info var1 var2 var3
1 foo Some string here string 1
1 foo Some string here string 2
1 foo Some string here string 3
2 bar A different string string 4 string 5
2 bar A different string string 6
3 baz Something else string 7 string 8
这就是我想要的:
id title info var1 var2 var3
1 foo Some string here string 1 string 2 string 3
2 bar A different string string 4 string 5 string 6
3 baz Something else string 7 string 8
我想我已经拥有了
ddply(merged, .(id, title, info), summarize, var1 = max(var1), var2 = max(var2), var3 = max(var3))
但是问题在于,还有更多的var1-var3变量,它们是通过编程生成的。结果,我需要一种方法,根据变量名列表以编程方式插入 var1 = max(var1)
等。
But the problem is that there are many more of the var1-var3 variables, and they are programmatically generated. As a result, I need a way to insert var1 = max(var1)
, etc. programmatically, based on an list of the variable names.
推荐答案
实现此目标的许多可能方法,有两种
Many possible ways achieving this, here are two
定义一些帮助函数
Myfunc <- function(x) x[x != '']
使用 data.table
library(data.table)
setDT(df)[, lapply(.SD, Myfunc), by = list(id, title, info)]
# id title info var1 var2 var3
# 1: 1 foo Some string here string 1 string 2 string 3
# 2: 2 bar A different string string 4 string 5 string 6
# 3: 3 baz Something else string 7 NA string 8
或类似地与 dplyr
library(dplyr)
df %>%
group_by(id, title, info) %>%
summarise_each(funs(Myfunc))
# Source: local data table [3 x 6]
# Groups: id, title
#
# id title info var1 var2 var3
# 1 1 foo Some string here string 1 string 2 string 3
# 2 2 bar A different string string 4 string 5 string 6
# 3 3 baz Something else string 7 NA string 8
这篇关于折叠数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!