问题描述
我有两个数据框---表A是模式表,表B是名称表.我想对表 B 进行子集化,它与表 a 中的模式相匹配.
I have two data frames --- table A is the pattern table, and table B is the name table. I want to subset table B, where it matches the pattern in table a.
A <- data.frame(pattern = c("aa", "bb", "cc", "dd"))
B <- data.frame(name = "aa1", "bb1", "abc", "def" ,"ddd")
我正在尝试做一个 for 循环,如下所示:
I'm trying to do a for loop looks like:
for (i in 1:nrow(A)){
for (j in 1:nrow(B)){
DT <- data.frame(grep(A$pattern[i], B$name[j], ignore.case = T, value = T))
}}
我希望我的结果表 DT
只包含 aa1
、bb1
和 ddd
And I want my resulting table DT
to only contains aa1
, bb1
, and ddd
但是它超级慢.我只是想知道是否有更有效的方法来做到这一点?多谢!
But it's super slow. I just wondering if there's any more efficient way to do it? Many thans!
推荐答案
您的示例输入数据中似乎存在轻微错误(缺少的 B$name
未正确声明,需要包含 stringsAsFactors = F
对于两个 data.frame
对象):
it appears there's a slight error in your sample input data (missing B$name
is not properly declared and need to include stringsAsFactors = F
for both data.frame
objects):
> A <- data.frame(pattern = c("aa", "bb", "cc", "dd"), stringsAsFactors = F)
> B <- data.frame(name = c("aa1", "bb1", "abc", "def" ,"ddd"), stringsAsFactors = F)
代码
# using sapply with grepl
> indices <- sapply(1:nrow(A), function(z) grepl(A$pattern[z], B$name[z]))
> indices
[1] TRUE TRUE FALSE FALSE
> B[indices, ]
[1] "aa1" "bb1" "ddd"
这篇关于根据另一个数据框中的列在一个数据框中应用正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!