问题描述
如何有效地计算一个字符串在另一个字符串中出现的实例数?
How would one efficiently count the number of instances of one character string which occur within another character string?
以下是我迄今为止的代码.它成功地识别出一个字符串的任何实例是否出现在另一个字符串中.但是,我不知道如何将其从 TRUE/FALSE 关系扩展到计数关系.
Below is my code to date. It successfully identifies if any instance of the one string occurs in the other string. However, I do not know how to extend it from a TRUE/FALSE relationship to a counting relationship.
x <- ("Hello my name is Christopher. Some people call me Chris")
y <- ("Chris is an interesting person to be around")
z <- ("Because he plays sports and likes statistics")
lll <- tolower(list(x,y,z))
dict <- tolower(c("Chris", "Hell"))
mmm <- matrix(nrow=length(lll), ncol=length(dict), NA)
for (i in 1:length(lll)) {
for (j in 1:length(dict)) {
mmm[i,j] <- sum(grepl(dict[j],lll[i]))
}
}
mmm
它产生:
[,1] [,2]
[1,] 1 1
[2,] 1 0
[3,] 0 0
由于小写字符串chris"在 lll[1]
中出现两次,我希望 mmm[1,1]
为 2 而不是 1.
Since the lower-case string "chris" appears twice in the lll[1]
I would like mmm[1,1]
to be 2 instead of 1.
真实的例子是更高的维度......所以如果代码可以被向量化而不是使用我的暴力循环,我会很高兴.
Real example is much higher dimension...so would love if code could be vectorized instead of using my brute force for loops.
推荐答案
两个快速提示:
- 避免双重 for 循环,你不需要它;)
- 使用
stringr
包
library(stringr)
dict <- setNames(nm=dict) # simply for neatness
lapply(dict, str_count, string=lll)
# $chris
# [1] 2 1 0
#
# $hell
# [1] 1 0 0
或者作为矩阵:
# sapply(dict, str_count, string=lll)
# chris hell
# [1,] 2 1
# [2,] 1 0
# [3,] 0 0
这篇关于R 中的计数模式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!