本文介绍了使用grep确定字符串的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 如果我有一个vector x 和do: y 表(y) 我得到: y ajjss auyjyjjksjj 1 1 然而,第二个字符串auyjyjjksjj应该计算子字符串jj两次。我怎么能把这个从真/假计算改变到实际计算jj的频率? 另外,如果对于每个字符串,子字符串的频率除以字符串的长度可以计算,这将是很大的。 预先感谢。解决方案 x (b)freq< - sapply(gregexpr(jj,x) df< -data.frame(x,freq) df #x freq #1 ajjss 1 #2 acdjfkj 0 #3 auyjyjjksjj 2 对于问题的最后部分,计算频率 / string length ... ... df $ rate 有必要将df $ x转换回字符串,因为数据.frame(x,freq)automati除非指定stringsAsFactors = F,否则将字符串转换为因子。 $ $ $ $ b $ x $ d #x freq rate # 1 ajjss 1 0.2000000 #2 acdjfkj 0 0.0000000 #3 auyjyjjksjj 2 0.1818182 if I have a vectorx <- c("ajjss","acdjfkj","auyjyjjksjj")and do:y <- x[grep("jj",x)]table(y)I get:y ajjss auyjyjjksjj 1 1However the second string "auyjyjjksjj" should count the substring "jj" twice. How can I change this from a true/false computation, to actually counting the frequency of "jj"?Also if for each string the frequency of the substring divided by the string's length could be calculated that would be great.Thanks in advance. 解决方案 I solved this using gregexpr()x <- c("ajjss","acdjfkj","auyjyjjksjj")freq <- sapply(gregexpr("jj",x),function(x)if(x[[1]]!=-1) length(x) else 0)df<-data.frame(x,freq)df# x freq#1 ajjss 1#2 acdjfkj 0#3 auyjyjjksjj 2And for the last part of the question, calculating frequency / string length...df$rate <- df$freq / nchar(as.character(df$x))It is necessary to convert df$x back to a character string because data.frame(x,freq) automatically converts strings to factors unless you specify stringsAsFactors=F.df# x freq rate#1 ajjss 1 0.2000000#2 acdjfkj 0 0.0000000#3 auyjyjjksjj 2 0.1818182 这篇关于使用grep确定字符串的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
09-05 18:41
查看更多