本文介绍了合并R中的多个变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,使得每个主题的差异列中包含相同的变量。我想将它们合并到相同的列。

I have a dataset such that the same variable is contained in difference columns for each subject. I want to merge them to the same columns.

例如:,我有这个数据框,有三个DV,但是它们在不同主题的(A,B,C)列中不同。

E.g.:, I have this dataframe, and there are three DVs, but they are in different columns (A,B,C) for different subjects.

data.frame(ID = c(1,2,3), DV1_A=c(1,NA,NA), DV1_B= c(NA,4,NA), DV1_C = c(NA,NA,5), DV2_A=c(3,NA,NA), DV2_B=c(NA,3,NA), DV2_C=c(NA,NA,5), FACT = c("A","B","C"))

如何将它们合并为仅两列?所以结果是:

How can I merge them to just two columns? so the result is:

data.frame(ID = c(1,2,3), DV1_A=c(1,NA,NA), DV1_B= c(NA,4,NA), DV1_C = c(NA,NA,5), DV2_A=c(3,NA,NA), DV2_B=c(NA,3,NA), DV2_C=c(NA,NA,5), FACT = c("A","B","C"), DV_1 = c(1,4,5), DV_2 = c(3,3,5))


推荐答案

您可以使用 coalesce dplyr 的code>:

You can use coalesce from dplyr:

library(dplyr)

df %>%
  mutate(DV_1 = coalesce(DV1_A, DV1_B, DV1_C),
         DV_2 = coalesce(DV2_A, DV2_B, DV2_C))

如果您有很多 DV 进行组合,您可能不想键入所有列名。在这种情况下,您可以首先 grep 每个 DV 的列名,然后将每个名称解析为带有<$ c的符号$ c> rlang :: syms ,然后拼接( !!! coalesce (来自@hadley的建议):

If you have a lot of DV columns to combine, you might not want to type all the column names. In this case, you can first grep the column names for each DV, parse each name to symbols with rlang::syms, then splice (!!!) the symbols in coalesce (Advice from @hadley):

library(rlang)
var_quo1 = syms(grep("DV1", names(df), value = TRUE))
var_quo2 = syms(grep("DV2", names(df), value = TRUE))

df %>%
  mutate(DV_1 = coalesce(!!! var_quo1),
         DV_2 = coalesce(!!! var_quo2))

如果相反,您有一吨 DV ,您甚至可能不想输入所有 coalesce 行,在这种情况下,您可以创建一个函数,该函数在给定输入数字和<$ c $的情况下输出一个 DV 列c> lapply + bind_col 一起:

If instead, you have a ton of DV's, you might not even want to type all the coalesce lines, in this case, you can create a function that outputs one DV column given an input number and lapply + bind_col all of them together:

DV_combine = function(num_DVs){

  DV_name = sym(paste0("DV", num_DVs))
  DV_syms = syms(grep(paste0("DV", num_DVs), names(df), value = TRUE))

  df %>%
    transmute(!!DV_name := coalesce(!!! DV_syms))
}

bind_cols(df, lapply(1:2, DV_combine))

结果:

  ID DV1_A DV1_B DV1_C DV2_A DV2_B DV2_C FACT DV_1 DV_2
1  1     1    NA    NA     3    NA    NA    A    1    3
2  2    NA     4    NA    NA     3    NA    B    4    3
3  3    NA    NA     5    NA    NA     5    C    5    5

注意:

此该方法对数字字符类列均适用,但对 factor 。使用此方法之前,应先将 factor 列转换为字符。

This method will work for both numeric and character class columns, but not factor's. One should first convert the factor columns to character before using this method.

数据:

df = structure(list(ID = c(1, 2, 3), DV1_A = c(1, NA, NA), DV1_B = c(NA,
4, NA), DV1_C = c(NA, NA, 5), DV2_A = c(3, NA, NA), DV2_B = c(NA,
3, NA), DV2_C = c(NA, NA, 5), FACT = structure(1:3, .Label = c("A",
"B", "C"), class = "factor")), .Names = c("ID", "DV1_A", "DV1_B",
"DV1_C", "DV2_A", "DV2_B", "DV2_C", "FACT"), row.names = c(NA,
-3L), class = "data.frame")

这篇关于合并R中的多个变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-30 19:45