我有一个看起来像这样的字符串向量,我想将其拆分:

str <- c("Fruit LoopsJalapeno Sandwich", "Red Bagel", "Basil LeafBarbeque SauceFried Beef")

str_split(str, '[a-z][A-Z]', n = 3)

[[1]]
[1] "Fruit Loop"       "alapeno Sandwich"

[[2]]
[1] "Red Bagel"

[[3]]
[1] "Basil Lea"    "arbeque Sauc" "ried Beef"

但是我需要将这些字母保留在单词的末尾和开头。

最佳答案

这是 base 中的 2 种方法(如果需要,您可以推广到 stringr )。

这个用占位符代替这个地方,然后在那个地方 split 。

strsplit(gsub("([a-z])([A-Z])", "\\1SPLITHERE\\2", str), "SPLITHERE")

## [[1]]
## [1] "Fruit Loops"       "Jalapeno Sandwich"
##
## [[2]]
## [1] "Red Bagel"
##
## [[3]]
## [1] "Basil Leaf"     "Barbeque Sauce" "Fried Beef"

此方法使用前瞻和后视:
strsplit(str, "(?<=[a-z])(?=[A-Z])", perl=TRUE)

## [[1]]
## [1] "Fruit Loops"       "Jalapeno Sandwich"
##
## [[2]]
## [1] "Red Bagel"
##
## [[3]]
## [1] "Basil Leaf"     "Barbeque Sauce" "Fried Beef"

编辑 推广到 stringr 所以你可以捕获 3 件,如果你想
stringr::str_split(gsub("([a-z])([A-Z])", "\\1SPLITHERE\\2", str), "SPLITHERE", 3)

关于regex - 拆分字符串,其中大写在 stringr 中跟随小写,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/28652193/

10-12 17:47