问题描述
我想基于多个条件合并2个数据帧.
I would like to merge 2 data frames based on multiple conditions.
DF1 <- data.frame("col1" = rep(c("A","B"), 18),
"col2" = rep(c("C","D","E"), 12),
"value"= (sample(1:100,36)),
"col4" = rep(NA,36))
DF2 <- data.frame("col1" = rep("A",6),
"col2" = rep(c("C","D"),3),
"data" = rep(c(1,3),3),
"min" = seq(0,59,by=10),
"max" = seq(10,69,by=10))
> DF1
col1 col2 value col4
1 A C 22 NA
2 B D 58 NA
3 A E 35 NA
4 B C 86 NA
5 A D 37 NA
6 B E 16 NA
7 A C 46 NA
8 B D 23 NA
9 A E 88 NA
10 B C 3 NA
11 A D 33 NA
12 B E 25 NA
13 A C 19 NA
14 B D 24 NA
15 A E 9 NA
16 B C 76 NA
17 A D 62 NA
18 B E 68 NA
19 A C 97 NA
20 B D 43 NA
21 A E 8 NA
22 B C 84 NA
23 A D 36 NA
24 B E 20 NA
25 A C 57 NA
26 B D 99 NA
27 A E 42 NA
28 B C 64 NA
29 A D 87 NA
30 B E 1 NA
31 A C 78 NA
32 B D 34 NA
33 A E 41 NA
34 B C 32 NA
35 A D 10 NA
36 B E 72 NA
> DF2
col1 col2 data min max
1 A C 1 0 10
2 A D 3 10 20
3 A C 1 20 30
4 A D 3 30 40
5 A C 1 40 50
6 A D 3 50 60
DF1是主表,而DF2被视为查找表
DF1 is the main table and DF2 is treated as a lookup table
如果DF1的col1和col2与DF2的col1和col2相匹配,并且DF1的值"在DF2的最小值和最大值之间,则DF2的数据"列将被添加到DF1.如果不满足条件,DF1的数据"将具有NA值.
If col1 and col2 of DF1 match that of DF2, and 'value' of DF1 is in between min and max of DF2, then column 'data' from DF2 will be added to DF1. If the conditions are not met, 'data' of DF1 will have value of NA.
预期输出(前6行):
col1 col2 value col4 data
1 A C 22 NA 1
2 B D 58 NA NA
3 A E 35 NA NA
4 B C 86 NA NA
5 A D 37 NA 3
6 B E 16 NA NA
我尝试过使用merge(匹配col1 snd col2)然后使用subset(仅过滤在min和max之间具有值的行),但是我的目标是维护DF1的所有行.
I've tried using merge (to match col1 snd col2) then subset (to filter only rows that have value in between min and max) , but my goal is to maintain all the rows of DF1.
有人对此有想法吗?
推荐答案
您的数据,更改了stringsAsFactors=F
DF1 <- data.frame("col1" = rep(c("A","B"), 18),
"col2" = rep(c("C","D","E"), 12),
"value"= (sample(1:100,36)),
"col4" = rep(NA,36),
stringsAsFactors=F)
DF2 <- data.frame("col1" = rep("A",6),
"col2" = rep(c("C","D"),3),
"data" = rep(c(1,3),3),
"min" = seq(0,59,by=10),
"max" = seq(10,69,by=10),
stringsAsFactors=F)
使用dplyr
, 1)使用left_join
, 2)合并两个数据,检查ifelse
value
是between
min
并max
rowwise
,然后 3)取消选择min
和max
列...
Using dplyr
, 1) merge the two data using left_join
, 2) check ifelse
value
is between
min
and max
rowwise
, then 3) unselect min
and max
columns...
library(dplyr)
left_join(DF1, DF2, by=c("col1","col2")) %>%
rowwise() %>%
mutate(data = ifelse(between(value,min,max), data, NA)) %>%
select(-min, -max)
不确定您是否期望执行某种聚合,但这是上面代码的输出
Not sure if you were expecting to perform some kind of aggregation, but here's the output of the above code
col1 col2 value col4 data
1 A C 23 NA NA
2 A C 23 NA 1
3 A C 23 NA NA
4 B D 59 NA NA
5 A E 57 NA NA
6 B C 8 NA NA
这篇关于R:基于多个条件(具有不相等的条件)合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!