本文介绍了dplyr:inner_join与部分字符串匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如果数据帧y
中的seed
列与x
中的string
列中的部分匹配,我想加入两个数据帧.此示例应说明:
I'd like to join two data frames if the seed
column in data frame y
is a partial match on the string
column in x
. This example should illustrate:
# What I have
x <- data.frame(idX=1:3, string=c("Motorcycle", "TractorTrailer", "Sailboat"))
y <- data_frame(idY=letters[1:3], seed=c("ractor", "otorcy", "irplan"))
x
idX string
1 1 Motorcycle
2 2 TractorTrailer
3 3 Sailboat
y
Source: local data frame [3 x 2]
idY seed
(chr) (chr)
1 a ractor
2 b otorcy
3 c irplan
# What I want
want <- data.frame(idX=c(1,2), idY=c("b", "a"), string=c("Motorcycle", "TractorTrailer"), seed=c("otorcy", "ractor"))
want
idX idY string seed
1 1 b Motorcycle otorcy
2 2 a TractorTrailer ractor
也就是说,类似
inner_join(x, y, by=stringr::str_detect(x$string, y$seed))
推荐答案
fuzzyjoin
库具有两个函数regex_inner_join
和fuzzy_inner_join
,允许您匹配部分字符串:
The fuzzyjoin
library has two functions regex_inner_join
and fuzzy_inner_join
that allow you to match partial strings:
x <- data.frame(idX=1:3, string=c("Motorcycle", "TractorTrailer", "Sailboat"))
y <- data.frame(idY=letters[1:3], seed=c("ractor", "otorcy", "irplan"))
x$string = as.character(x$string)
y$seed = as.character(y$seed)
library(fuzzyjoin)
x %>% regex_inner_join(y, by = c(string = "seed"))
idX string idY seed
1 1 Motorcycle b otorcy
2 2 TractorTrailer a ractor
library(stringr)
x %>% fuzzy_inner_join(y, by = c("string" = "seed"), match_fun = str_detect)
idX string idY seed
1 1 Motorcycle b otorcy
2 2 TractorTrailer a ractor
这篇关于dplyr:inner_join与部分字符串匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!