

此问题可能与 在R或python中(我是->是吗?) ,但是由于上一个已经关闭,它说得太宽泛了,所以我又添加了它,唯一的答案不是高效(因为它为此访问外部网站,所以速度太慢,因为我的语料库非常庞大,无法找到引理).因此,该问题的一部分与上述问题类似.

This question is a possible duplicate of Lemmatizer in R or python (am, are, is -> be?), but I'm adding it again since the previous one was closed saying it was too broad and the only answer it has is not efficient (as it accesses an external website for this, which is too slow as I have very large corpus to find the lemmas for). So a part of this question will be similar to the above mentioned question.


According to Wikipedia, lemmatization is defined as:

一个简单的Google搜索R中的词形化将仅 指向R的包wordnet.当我尝试此程序包时,期望输入词法向量c("run", "ran", "running")会导致词形化功能在c("run", "run", "run")中,我看到此软件包通过各种过滤器名称和字典仅提供了与grepl函数相似的功能.

A simple Google search for lemmatization in R will only point to the package wordnet of R. When I tried this package expecting that a character vector c("run", "ran", "running") input to the lemmatization function would result in c("run", "run", "run"), I saw that this package only provides functionality similar to grepl function through various filter names and a dictionary.


An example code from wordnet package, which gives maximum of 5 words starting with "car", as the filter name explains itself:

filter <- getTermFilter("StartsWithFilter", "car", TRUE)
terms <- getIndexTerms("NOUN", 5, filter)
sapply(terms, getLemma)

以上是我正在寻找的词形化.我要寻找的是使用R来查找单词的真实词根:(例如,从c("run", "ran", "running")c("run", "run", "run")).

The above is NOT the lemmatization that I'm looking for. What I'm looking for is, using R I want to find true roots of the words: (For e.g. from c("run", "ran", "running") to c("run", "run", "run")).


您好,您可以尝试打包 koRpus 允许使用 Treetagger :

Hello you can try package koRpus which allow to use Treetagger :

tagged.results <- treetag(c("run", "ran", "running"), treetagger="manual", format="obj",
                      TT.tknz=FALSE , lang="en",
                      TT.options=list(path="./TreeTagger", preset="en"))

##     token tag lemma lttr wclass                               desc stop stem
## 1     run  NN   run    3   noun             Noun, singular or mass   NA   NA
## 2     ran VVD   run    3   verb                   Verb, past tense   NA   NA
## 3 running VVG   run    7   verb Verb, gerund or present participle   NA   NA


See the lemma column for the result you're asking for.


08-20 09:47