我有以下组织数据:

EmployeeID <- c(10:15)
Job.Title <- c("Program Manager", "Development Manager", "Developer" , "Developer", "Developer", "Summer Intern")
Level.1 <- c(1,1,1,1,1,1)
Level.2 <- c(2,2,2,2,2,2)
Level.3 <- c("",10,10,10,10,10)
Level.4 <- c("","",11,11,11,11)
Level.5 <- c("","","","","",12)
Level.6 <- c("","","","","","")
Pay.Type <- c("Salary", "Salary", "Salary", "Salary", "Salary", "Hourly")
acme = data.frame(EmployeeID, Job.Title, Level.1, Level.2, Level.3, Level.4, Level.5, Level.6, Pay.Type)

acme

  EmployeeID           Job.Title Level.1 Level.2 Level.3 Level.4 Level.5 Level.6 Pay.Type
1         10     Program Manager       1       2                                   Salary
2         11 Development Manager       1       2      10                           Salary
3         12           Developer       1       2      10      11                   Salary
4         13           Developer       1       2      10      11                   Salary
5         14           Developer       1       2      10      11                   Salary
6         15       Summer Intern       1       2      10      11      12           Hourly

对于每一行,我需要确定Level.1到Level.6的第一个非NULL值,从右边的Level.6开始,然后是Level.5,然后是Level.4,依此类推。我还需要以相同的模式识别第二个非null值。每行的标识值需要放入新列中,因此最终表如下所示:
  EmployeeID           Job.Title Level.1 Level.2 Level.3 Level.4 Level.5 Level.6 Pay.Type Supervisor Manager
1         10     Program Manager       1       2                                   Salary          2       1
2         11 Development Manager       1       2      10                           Salary         10       2
3         12           Developer       1       2      10      11                   Salary         11      10
4         13           Developer       1       2      10      11                   Salary         11      10
5         14           Developer       1       2      10      11                   Salary         11      10
6         15       Summer Intern       1       2      10      11      12           Hourly         12      11

最佳答案

我们可以按行使用apply并获取所有非空的索引,然后选择第一个和第二个值分别获得两列。

acme[, c("Supervisor", "Manager")] <- t(apply(acme[, 8:3], 1,
                      function(x) c(x[which(x != "")[1]], x[which(x != "")[2]])))

acme

#  EmployeeID           Job.Title Level.1 Level.2 Level.3 Level.4 Level.5 Level.6 Pay.Type Supervisor Manager
#1         10     Program Manager       1       2                                   Salary          2       1
#2         11 Development Manager       1       2      10                           Salary         10       2
#3         12           Developer       1       2      10      11                   Salary         11      10
#4         13           Developer       1       2      10      11                   Salary         11      10
#5         14           Developer       1       2      10      11                   Salary         11      10
#6         15       Summer Intern       1       2      10      11      12           Hourly         12      11

编辑

如果有很多列,我们需要找到开始和结束列的索引。我们可以使用grep
mincol <- min(grep("Level", colnames(acme)))
maxcol <- max(grep("Level", colnames(acme)))

 acme[, c("Supervisor", "Manager")] <- t(apply(acme[, maxcol:mincol], 1,
                      function(x) c(x[which(x != "")[1]], x[which(x != "")[2]])))

应该管用。

如果我们只需要Supervisor,则可以忽略第二部分。
acme[, "Supervisor"] <- t(apply(acme[, maxcol:mincol], 1,
                            function(x) x[which(x != "")[1]]))

关于r - 如何为每一行返回一系列列中的第一个非NULL值?第二个非NULL值?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/42083293/

10-09 21:51