我有一个要从中转换的文本字符串
text = "end back@drive@o correct back@drive@adjust@cats@do to tok"

"end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"
相反,我通常要替换

"a@b@c" with "a@b b@c"
"a@b@c@d" with "a@b b@c c@d"

等等。我在下面的尝试使用了stringr包。
patterns = unlist(str_extract_all(text, "([[:alnum:]]+@){2,}[[:alnum:]]+"))
replacements = strsplit(patterns, "@")
replacements = lapply(replacements, function(y) {
  pretuples = y[-length(y)]
  posttuples = y[-1]
  paste(paste0(pretuples, "@", posttuples), collapse = " ")
})
replacements = do.call(c, replacements)
str_replace_all(text, pattern = patterns, replacement = replacements)

我不认为str_replace_all是我最后要寻找的功能,当然它(合理地)返回
[1] "end back@drive drive@o correct back@drive@adjust to tok"
[2] "end back@drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

谁能帮我解决这个问题?

非常感谢。

编辑:到目前为止,响应已经非常有用,但是我正在解析这是一个大文件,并且实际上不知道此a@b@c@d...模式将被链接多少次。是否有更通用的解决方案,在模式的长度上不依赖于硬编码(如上所述)?

最佳答案

尝试

pat <- "(\\s|\\b)[^@]+\\s(*SKIP)(*FAIL)|(?<=@)([^@]*)(?=@)"
repl <- "\\2 \\2"
gsub(pat, repl, text, perl=TRUE)
#[1] "end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

对于“str1”
gsub(pat, repl, str1, perl=TRUE)
#[1] "a@b b@c"                     "a@b b@c c@d"
#[3] "a@b b@c c@d d@e e@f f@g g@h"

数据
text  <- "end back@drive@o correct back@drive@adjust@cats@do to tok"
str1 <- c("a@b@c", "a@b@c@d", "a@b@c@d@e@f@g@h")

09-25 17:42
查看更多