问题描述
我有一个数据集,使得每个主题的差异列中包含相同的变量。我想将它们合并到相同的列。
I have a dataset such that the same variable is contained in difference columns for each subject. I want to merge them to the same columns.
例如:,我有这个数据框,有三个DV,但是它们在不同主题的(A,B,C)列中不同。
E.g.:, I have this dataframe, and there are three DVs, but they are in different columns (A,B,C) for different subjects.
data.frame(ID = c(1,2,3), DV1_A=c(1,NA,NA), DV1_B= c(NA,4,NA), DV1_C = c(NA,NA,5), DV2_A=c(3,NA,NA), DV2_B=c(NA,3,NA), DV2_C=c(NA,NA,5), FACT = c("A","B","C"))
如何将它们合并为仅两列?所以结果是:
How can I merge them to just two columns? so the result is:
data.frame(ID = c(1,2,3), DV1_A=c(1,NA,NA), DV1_B= c(NA,4,NA), DV1_C = c(NA,NA,5), DV2_A=c(3,NA,NA), DV2_B=c(NA,3,NA), DV2_C=c(NA,NA,5), FACT = c("A","B","C"), DV_1 = c(1,4,5), DV_2 = c(3,3,5))
推荐答案
您可以使用 coalesce 来自
dplyr
的code>:
You can use coalesce
from dplyr
:
library(dplyr)
df %>%
mutate(DV_1 = coalesce(DV1_A, DV1_B, DV1_C),
DV_2 = coalesce(DV2_A, DV2_B, DV2_C))
如果您有很多 DV
列进行组合,您可能不想键入所有列名。在这种情况下,您可以首先 grep
每个 DV
的列名,然后将每个名称解析为带有<$ c的符号$ c> rlang :: syms ,然后拼接( !!!
) coalesce $ c $中的符号c>(来自@hadley的建议):
If you have a lot of DV
columns to combine, you might not want to type all the column names. In this case, you can first grep
the column names for each DV
, parse each name to symbols with rlang::syms
, then splice (!!!
) the symbols in coalesce
(Advice from @hadley):
library(rlang)
var_quo1 = syms(grep("DV1", names(df), value = TRUE))
var_quo2 = syms(grep("DV2", names(df), value = TRUE))
df %>%
mutate(DV_1 = coalesce(!!! var_quo1),
DV_2 = coalesce(!!! var_quo2))
如果相反,您有一吨 DV
的,您甚至可能不想输入所有 coalesce
行,在这种情况下,您可以创建一个函数,该函数在给定输入数字和<$ c $的情况下输出一个 DV
列c> lapply + bind_col
一起:
If instead, you have a ton of DV
's, you might not even want to type all the coalesce
lines, in this case, you can create a function that outputs one DV
column given an input number and lapply
+ bind_col
all of them together:
DV_combine = function(num_DVs){
DV_name = sym(paste0("DV", num_DVs))
DV_syms = syms(grep(paste0("DV", num_DVs), names(df), value = TRUE))
df %>%
transmute(!!DV_name := coalesce(!!! DV_syms))
}
bind_cols(df, lapply(1:2, DV_combine))
结果:
ID DV1_A DV1_B DV1_C DV2_A DV2_B DV2_C FACT DV_1 DV_2
1 1 1 NA NA 3 NA NA A 1 3
2 2 NA 4 NA NA 3 NA B 4 3
3 3 NA NA 5 NA NA 5 C 5 5
注意:
此该方法对数字
和字符
类列均适用,但对 factor $ c不适用$ c>。使用此方法之前,应先将
factor
列转换为字符。
This method will work for both numeric
and character
class columns, but not factor
's. One should first convert the factor
columns to character before using this method.
数据:
df = structure(list(ID = c(1, 2, 3), DV1_A = c(1, NA, NA), DV1_B = c(NA,
4, NA), DV1_C = c(NA, NA, 5), DV2_A = c(3, NA, NA), DV2_B = c(NA,
3, NA), DV2_C = c(NA, NA, 5), FACT = structure(1:3, .Label = c("A",
"B", "C"), class = "factor")), .Names = c("ID", "DV1_A", "DV1_B",
"DV1_C", "DV2_A", "DV2_B", "DV2_C", "FACT"), row.names = c(NA,
-3L), class = "data.frame")
这篇关于合并R中的多个变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!