我正在使用tm包来应用词干,并且需要将结果数据转换为数据帧。
一个解决方案可以在R tm package vcorpus: Error in converting corpus to data frame上找到,但就我而言,我的语料库内容为:

[[2195]]
i was very impress

代替
[[2195]]
"i was very impress"

因此,如果我申请
data.frame(text=unlist(sapply(mycorpus, `[`, "content")), stringsAsFactors=FALSE)

结果将是
<NA>.

任何帮助深表感谢!

下面的代码为例:
sentence <- c("a small thread was loose on the sandals, otherwise it looked good")
mycorpus <- Corpus(VectorSource(sentence))
mycorpus <- tm_map(mycorpus, stemDocument, language = "english")

inspect(mycorpus)

[[1]]
a small thread was loo on the sandals, otherwi it look good

data.frame(text=unlist(sapply(mycorpus, `[`, "content")), stringsAsFactors=FALSE)

 text
1 <NA>

最佳答案

通过应用

gsub("http\\w+", "", mycorpus)

输出具有class = character,因此适用于我的情况。

关于r - 将语料库转换为R中的data.frame,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/25490088/

10-11 04:04