我有一个包含2列GLGLDESC的数据框,并想基于KIND列中的某些数据添加一个名为GLDESC的第三列。

数据帧如下:

      GL                             GLDESC
1 515100         Payroll-Indir Salary Labor
2 515900 Payroll-Indir Compensated Absences
3 532300                           Bulk Gas
4 539991                     Area Charge In
5 551000        Repairs & Maint-Spare Parts
6 551100                 Supplies-Operating
7 551300                        Consumables

对于数据表的每一行:
  • 如果GLDESC在字符串的任意位置包含单词Payroll,那么我希望KINDPayroll
  • 如果GLDESC在字符串的任意位置包含单词Gas,那么我希望KINDMaterials
  • 在所有其他情况下,我都希望KIND成为Other

  • 我在stackoverflow上查找了类似的示例,但找不到任何示例,还在R中查找了开关,grep,apply和正则表达式上的虚拟变量,以尝试仅匹配GLDESC列的一部分,然后用帐户类型填充KIND列,但无法使其工作。

    最佳答案

    由于只有两个条件,因此可以使用嵌套的ifelse:

    #random data; it wasn't easy to copy-paste yours
    DF <- data.frame(GL = sample(10), GLDESC = paste(sample(letters, 10),
      c("gas", "payroll12", "GaSer", "asdf", "qweaa", "PayROll-12",
         "asdfg", "GAS--2", "fghfgh", "qweee"), sample(letters, 10), sep = " "))
    
    DF$KIND <- ifelse(grepl("gas", DF$GLDESC, ignore.case = T), "Materials",
             ifelse(grepl("payroll", DF$GLDESC, ignore.case = T), "Payroll", "Other"))
    
    DF
    #   GL         GLDESC      KIND
    #1   8        e gas l Materials
    #2   1  c payroll12 y   Payroll
    #3  10      m GaSer v Materials
    #4   6       t asdf n     Other
    #5   2      w qweaa t     Other
    #6   4 r PayROll-12 q   Payroll
    #7   9      n asdfg a     Other
    #8   5     d GAS--2 w Materials
    #9   7     s fghfgh e     Other
    #10  3      g qweee k     Other
    

    编辑 10/3/2016(..获得了比预期更多的关注)

    处理更多模式的一种可能解决方案是遍历所有模式,并在存在匹配项时逐渐减少比较量:
    ff = function(x, patterns, replacements = patterns, fill = NA, ...)
    {
        stopifnot(length(patterns) == length(replacements))
    
        ans = rep_len(as.character(fill), length(x))
        empty = seq_along(x)
    
        for(i in seq_along(patterns)) {
            greps = grepl(patterns[[i]], x[empty], ...)
            ans[empty[greps]] = replacements[[i]]
            empty = empty[!greps]
        }
    
        return(ans)
    }
    
    ff(DF$GLDESC, c("gas", "payroll"), c("Materials", "Payroll"), "Other", ignore.case = TRUE)
    # [1] "Materials" "Payroll"   "Materials" "Other"     "Other"     "Payroll"   "Other"     "Materials" "Other"     "Other"
    
    ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"),
       c("pat1a|pat1b", "pat2", "pat3"),
       c("1", "2", "3"), fill = "empty")
    #[1] "1"     "1"     "3"     "empty"
    
    ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"),
       c("pat2", "pat1a|pat1b", "pat3"),
       c("2", "1", "3"), fill = "empty")
    #[1] "2"     "1"     "3"     "empty"
    

    关于regex - 根据与其他列匹配的部分字符串在数据框中创建新列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/19747384/

    10-10 00:40
    查看更多