问题描述
如何使用R将语料库中的复数文本转换为单数我正在使用tm"包,但找不到任何功能.我已经尝试过这个功能,但这个我不能应用于语料库.
How to convert plural text into singular from corpus using Ri am tring with "tm" package but i am not able to find any function.i have try with this function but this i can not apply to the corpus.
aggregate.plurals <- function (v) {
aggro_fen <- function(v, singular, plural) {
if (! is.na(v[plural])) {
v[singular] <- v[singular] + v[plural]
v <- v[-which(names(v) == plural)]
}
return(v)
}
for (n in names(v)) {
n_pl <- paste(n, 's', Sep='')
v <- aggro_fen(v, n, n_pl)
n_pl <- paste(n, 'es', Sep='')
v <- aggro_fen(v, n, n_pl)
}
return(v)
}
推荐答案
如果您在进行文本分析,您可能会在更广泛的上下文中寻找单词转换,而不仅仅是单复数.这将是一个词干,你可以直接在 tm 语料库上使用来自 'SnowballC' 的 'stemDocument' 函数和 'tm_map' 函数
If you are doing text analysis you might be looking for word conversion in a broader context than only singular - plural. That would be stemming and you can use the 'stemDocument' function from 'SnowballC' directly on tm corpus with 'tm_map' function
reut21578 <- system.file("texts", "crude", package = "tm")
reuters <- VCorpus(DirSource(reut21578, mode = "binary"), readerControl = list(reader = readReut21578XMLasPlain))
tm_map(reuters, stemDocument)
来源:tm 介绍文件 https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf
source: tm introduction paper https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf
这篇关于使用R从复数转换为单数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!