警告:这个问题似乎很简单,作为初学者,我可能无法在更复杂的 SO 主题中找到正确的解决方案(查看 hereherehere 和更多地方)

我想根据另一列填充数据框中的一列,并将其他列用作输入。
举个例子就更清楚了:

  Version1 Version2 Version3 Version4 Presented_version Color
1     blue      red    green   yellow                 1    NA
2      red     blue   yellow    green                 4    NA
3   yellow    green      red     blue                 3    NA


我想用 Version1/Version2/Version3/Version 4 的值填充“ Color ”列。列 Presented_version 告诉我需要这四个值中的哪一个。
例如,在第 1 行中,Presented_version 为 1,因此所需的值在“Version1”(“blue”)中。第 1 行的颜色应为蓝色。

有人可以向我展示一种无需使用大量“if”语句循环遍历数据框的方法吗?

structure(list(Version1 = structure(1:3, .Label = c("blue", "red",
"yellow"), class = "factor"), Version2 = structure(c(3L, 1L,
2L), .Label = c("blue", "green", "red"), class = "factor"), Version3 = structure(c(1L,
3L, 2L), .Label = c("green", "red", "yellow"), class = "factor"),
    Version4 = structure(3:1, .Label = c("blue", "green", "yellow"
    ), class = "factor"), Presented_version = c(1L, 4L, 3L),
    Color = c(NA, NA, NA)), class = "data.frame", row.names = c(NA,
-3L))

========================
编辑!

我简化了这个例子来解释我的问题,但上面的例子与我的实际数据集在几个方面有所不同,因此解决方案做出了我的数据实际上并不满足的假设。
这是 data.frame 的更准确表示。特别是,Presented_version 和 Version1...Version 4 列的内容之间没有固定匹配(这取决于额外的列,我现在称之为 Painter),Version1 到 Version4 不一定在第 1 到 4 列在我的数据集中。
  FillerColumn Painter Version1 Version2 Version3 Version4 Version_presented Color FillerColumn.1
1           77       A     blue      red    green   yellow                 1    NA             77
2           77       B      red     blue   yellow    green                 4    NA             77
3           77       C   yellow    green      red     blue                 3    NA             77
4           77       D      red     blue   yellow    green                 1    NA             77
structure(list(FillerColumn = c(77L, 77L, 77L, 77L), Painter = structure(1:4, .Label = c("A",
"B", "C", "D"), class = "factor"), Version1 = structure(c(1L,
2L, 3L, 2L), .Label = c("blue", "red", "yellow"), class = "factor"),
    Version2 = structure(c(3L, 1L, 2L, 1L), .Label = c("blue",
    "green", "red"), class = "factor"), Version3 = structure(c(1L,
    3L, 2L, 3L), .Label = c("green", "red", "yellow"), class = "factor"),
    Version4 = structure(c(3L, 2L, 1L, 2L), .Label = c("blue",
    "green", "yellow"), class = "factor"), Version_presented = c(1L,
    4L, 3L, 1L), Color = c(NA, NA, NA, NA), FillerColumn.1 = c(77L,
    77L, 77L, 77L)), class = "data.frame", row.names = c(NA,
-4L))

最佳答案

使用 mapply 的一种方式

cols <- grep("^Version", names(df))
df$Color <- unlist(mapply(function(x, y) df[x, cols][y],
                   1:nrow(df),df$Presented_version))

df
#  Version1 Version2 Version3 Version4 Presented_version Color
#1     blue      red    green   yellow                 1  blue
#2      red     blue   yellow    green                 4 green
#3   yellow    green      red     blue                 3   red

apply
apply(df, 1, function(x) x[cols][as.numeric(x["Presented_version"])])
#[1] "blue"  "green" "red"

关于r - 在我的数据框中以另一列为条件填充一列,使用第三列中的值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/55379714/

10-12 23:21
查看更多