df1 <- data.frame(MLID=c('992','992','BJR'),
              Position=c('N0','N1','N1'),
              Weight=c(0.125,0.58,0.69))


df2 <- data.frame(MLID=c('992','992','992','992',
                     'BJR','BJR','BJR','BJR'),
              Weight=c(0,0.251,0.501,1.001,
                       0,0.251,0.501,1.001),
              N0=c(2.80,4.05,4.05,4.05,
                   4.05,4.05,4.05,4.05),
              N1=c(3.47,4.73,4.95,5.15,
                   4.73,7.73,4.95,5.15) )

我想要的是合并这两个表遵循规则:
  • MLID
  • 看位置(Its N0 or N1)
  • 找到权重所在的范围(类似于 excel 中的近似 vlookup)(2.8 表示 992 N0 的重量(0,0.250)收费,992 N0 的重量(0.251,0.500)收费 4.05,重量(0, 0.250) 992 N1 等

  • 所以最终的输出应该是:
    MILD  Position  Weight  Charge
    992      N0     0.125    2.8
    992      N1     0.580    4.95
    BJR      N1     0.690    4.95
    

    可以在R中做到吗?特别是在 dplyr 包中?

    最佳答案

    可以实现使用 data.table rolling join 的选项。首先,需要使用 df2long-format 转换为 melt ,然后加入 df1df2

    library(data.table)
    
    setDT(df1, key = c("MLID", "Position","Weight") )
    
    df2 <- melt(df2, id.vars = c("MLID","Weight"), variable.name = "Position",
                                                          value.name = "Charge")
    
    setDT(df2, key = c("MLID", "Position","Weight"))
    
    df2[df1, roll = "nearest"]
    #    MLID Weight Position Charge
    # 1:  992  0.580       N1   4.95
    # 2:  992  0.125       NO   2.80
    # 3:  BJR  0.690       N1   4.95
    

    选项#2: 基于 tidyverse 的方法可以是:
    library(tidyverse)
    df2 %>% gather(Position, Charge, -MLID, -Weight) %>%
      right_join(df1, by=c("MLID", "Position")) %>%
      filter(Weight.x <= Weight.y) %>%
      group_by(MLID, Position) %>%
      arrange(Weight.y-Weight.x) %>%
      slice(1) %>%
      select(MLID, Weight = Weight.y, Position, Charge)
    
    # # A tibble: 3 x 4
    # # Groups: MLID, Position [3]
    #   MLID  Weight Position Charge
    #   <chr>  <dbl> <chr>     <dbl>
    # 1 992    0.580 N1         4.95
    # 2 992    0.125 NO         2.80
    # 3 BJR    0.690 N1         4.95
    

    数据:
    OP's 数据稍作修改,以在 stringsAsFactors = FALSE 中包含 data.frame 参数,以避免出现不必要的警告。
    df1 <- data.frame(MLID=c('992','992','BJR'),
                      Position=c('NO','N1','N1'),
                      Weight=c(0.125,0.58,0.69), stringsAsFactors = FALSE)
    
    
    df2 <- data.frame(MLID=c('992','992','992','992',
                             'BJR','BJR','BJR','BJR'),
                      Weight=c(0,0.251,0.501,1.001,
                               0,0.251,0.501,1.001),
                      NO=c(2.80,4.05,4.05,4.05,
                           4.05,4.05,4.05,4.05),
                      N1=c(3.47,4.73,4.95,5.15,
                           4.73,7.73,4.95,5.15), stringsAsFactors = FALSE )
    

    关于r - 如何根据R中的行和列合并两个表?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/51074982/

    10-12 20:45