本文介绍了计数在r中的词向量中的特定字母的出现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 我想计算长字向量中特定字母的数量。 例如: 我想在以下向量中计算字母A的数量。 myvec $ b b 因此,预期输出将是: c(1,0,1,0,3,2,1) 任何想法? 解决方案另一种可能性: myvec< - C(A,破坏,PASS,JUMP,BANANA,AALU,KPAL) sapply(gregexpr(A ,myvec,fixed = TRUE),function(x)sum(x> -1)) ## [1] 1 0 1 0 3 2 1 EDIT 这是一个基准: library(stringr); library(stringi);库(microbenchmark);库(qdapDictionaries) myvec< - TOUPPER(GradyAugmented) GREGEXPR< - ()的函数sapply(gregexpr(A,myvec,固定= TRUE) function(x)sum(x> -1)) GSUB< - function()nchar(gsub([^ A],,myvec)) STRSPLIT STRINGR< - function()str_count(myvec,A) STRINGI < - ()的函数stri_count(myvec,固定=A) VAPPLY_STRSPLIT< - ()的函数vapply(strsplit(myvec,),功能(x)的总和(X =='A') ,integer(1)) (op GREGEXPR(), GSUB(), STRINGI b $ b STRINGR(), STRSPLIT(), VAPPLY_STRSPLIT(), times = 50L)) ##单位:毫秒# #EXPR分钟LQ平均中位数UQ最大neval ## GREGEXPR()477.278895 631.009023 688.845407 705.878827 745.73596 906.83006 50 ## GSUB()197.127403 202.313022 209.485179 205.538073 208.90271 270.19368 50 ## STRINGI() 7.854174 8.354631 8.944488 8.663362 9.32927 11.19397 50 ## STRINGR()618.161777 679.103777 797.905086 787.554886 906.48192 1115.59032 50 ## STRSPLIT()244.721701 273.979330 331.281478 294.944321 348.07895 516.47833 50 ## VAPPLY_STRSPLIT()184.042451 206.049820 253.430502 219.107882 251.80117 595.02417 50 箱线图(OP) 和 stringi whooping一些主要尾巴。 vapply + strsplit 是一个不错的方法,因为简单的 gsub 方法。有趣的结果。 I am trying to count number of particular letter in long vector of words.for example:I would like to count number of letter "A" in the following vector.myvec <- c("A", "KILLS", "PASS", "JUMP", "BANANA", "AALU", "KPAL")So intended output would be:c(1,0,1,0, 3,2,1)Any idea ? 解决方案 Another posibility:myvec <- c("A", "KILLS", "PASS", "JUMP", "BANANA", "AALU", "KPAL")sapply(gregexpr("A", myvec, fixed = TRUE), function(x) sum(x > -1))## [1] 1 0 1 0 3 2 1EDIT This was begging for a benchmark:library(stringr); library(stringi); library(microbenchmark); library(qdapDictionaries)myvec <- toupper(GradyAugmented)GREGEXPR <- function() sapply(gregexpr("A", myvec, fixed = TRUE), function(x) sum(x > -1))GSUB <- function() nchar(gsub("[^A]", "", myvec))STRSPLIT <- function() sapply(strsplit(myvec,""), function(x) sum(x=='A'))STRINGR <- function() str_count(myvec, "A")STRINGI <- function() stri_count(myvec, fixed="A")VAPPLY_STRSPLIT <- function() vapply(strsplit(myvec,""), function(x) sum(x=='A'), integer(1))(op <- microbenchmark( GREGEXPR(), GSUB(), STRINGI(), STRINGR(), STRSPLIT(), VAPPLY_STRSPLIT(), times=50L))## Unit: milliseconds## expr min lq mean median uq max neval## GREGEXPR() 477.278895 631.009023 688.845407 705.878827 745.73596 906.83006 50## GSUB() 197.127403 202.313022 209.485179 205.538073 208.90271 270.19368 50## STRINGI() 7.854174 8.354631 8.944488 8.663362 9.32927 11.19397 50## STRINGR() 618.161777 679.103777 797.905086 787.554886 906.48192 1115.59032 50## STRSPLIT() 244.721701 273.979330 331.281478 294.944321 348.07895 516.47833 50## VAPPLY_STRSPLIT() 184.042451 206.049820 253.430502 219.107882 251.80117 595.02417 50boxplot(op)And stringi whooping some major tail. The vapply + strsplit was a nice approach as was the simple gsub approach. Interesting results for sure. 这篇关于计数在r中的词向量中的特定字母的出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 10-28 12:54