我想知道是否有可能删除重复的句子甚至重复的文本块,这意味着从R中的数据框中删除重复的句子集。在我的特定情况下,您可以想象我保存了一个论坛的帖子,但是当某人引用以前发表的帖子,现在想从包含不同帖子的不同单元格中删除所有引用时,未突出显示。感谢您的提示或提示。
一个例子可能看起来像这样:
names <- c("Richard", "Mortimer", "Elizabeth", "Jeremiah")
posts <- c("I'm trying to find a solution for a problem with my neighbour, she keeps mowing the lawn on sundays when I'm trying to sleep in from my night shift", "Personally, I like to deal with annoying neighbours by just straight up confronting them. Don't shy away. There are always ways to work things out.", "Personally, I like to deal with annoying neighbours by just straight up confronting them. Don't shy away. There are always ways to work things out. That sounds quite aggressive. How about just talking to them in a friendly way, first?", "That sounds quite aggressive. How about just talking to them in a friendly way, first? Didn't mean to sound aggressive, rather meant just being straightforward, if that makes any sense")
duplicateposts <- data.frame(names, posts)
posts2 <- c("I'm trying to find a solution for a problem with my neighbour, she keeps mowing the lawn on sundays when I'm trying to sleep in from my night shift", "Personally, I like to deal with annoying neighbours by just straight up confronting them. Don't shy away. There are always ways to work things out.", "That sounds quite aggressive. How about just talking to them in a friendly way, first?", "Didn't mean to sound aggressive, rather meant just being straightforward, if that makes any sense")
postsnoduplicates <- data.frame(names, posts2)
最佳答案
我认为您需要在句子结尾处添加strsplit
,查找重复项,然后再将paste
放回原处。就像是:
spl <- strsplit(as.character(duplicateposts$posts), "(?<=[.?!])(?=.)", perl=TRUE)
spl <- lapply(spl, trimws)
spl <- stack(setNames(spl, duplicateposts$names))
aggregate(values ~ ind, data=spl[!duplicated(spl$values),], FUN=paste, collapse=" ")
导致:
# ind values
#1 Richard I'm trying to find a solution for a problem with my neighbour, she keeps mowing the lawn on sundays when I'm trying to sleep in from my night shift
#2 Mortimer Personally, I like to deal with annoying neighbours by just straight up confronting them. Don't shy away. There are always ways to work things out.
#3 Elizabeth That sounds quite aggressive. How about just talking to them in a friendly way, first?
#4 Jeremiah Didn't mean to sound aggressive, rather meant just being straightforward, if that makes any sense