本文介绍了R DocumentTermMatrix控件列表不起作用,静默忽略未知参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下两个DTM:

dtm <- DocumentTermMatrix(t)

dtmImproved <- DocumentTermMatrix(t, 
               control=list(minWordLength = 4, minDocFreq=5))

实现此功能时,我看到两个相等的DTM-,如果打开dtmImproved,则会有带有3个符号的单词. minWordLength参数为什么不起作用?谢谢!

When I implement this, I see two equal DTM-s and if I open the dtmImproved, there are words with 3 symbols. Why doesn't the minWordLength parameter work? Thank you!

> dtm
A document-term matrix (591 documents, 10533 terms)

Non-/sparse entries: 43058/6181945
Sparsity           : 99%
Maximal term length: 135 
Weighting          : term frequency (tf)
> dtmImproved
A document-term matrix (591 documents, 10533 terms)

Non-/sparse entries: 43058/6181945
Sparsity           : 99%
Maximal term length: 135 
Weighting          : term frequency (tf)

推荐答案

dtmImproved <- DocumentTermMatrix(t, control=list(wordLengths=c(4, 15), 
                                   bounds = list(global = c(5,Inf))))

这解决了问题!缺乏适当的文档确实使我生气(:

This solves the problem! The lack of proper documentation really mads me down (:

这篇关于R DocumentTermMatrix控件列表不起作用,静默忽略未知参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-30 00:14