问题描述
我正在使用tm软件包,并希望使用R获得文档的Flesch-Kincaid分数.我发现koRpus软件包具有很多指标,包括阅读水平,因此开始使用它.但是,返回的对象似乎是一个非常复杂的s4对象,我不知道该如何解析.
I'm using the tm package, and looking to get the Flesch-Kincaid scores for a document using R. I found the koRpus package has some a lot of metrics including reading-level, and started using that. However, the object returned seems to be a very complicated s4 object I don't understand how to parse.
因此,我将其应用于我的语料库:
So, I apply this to my corpus:
txt <- system.file("texts", "txt", package = "tm")
(d <- Corpus(DirSource(txt, encoding = "UTF-8"), readerControl = list(language = "lat")))
f <- function(x) tokenize(x, format="obj", lang='en')
g <- function(x) flesch.kincaid(x)
x <- foreach(i=1:5) %dopar% g(f(d[[i]]))
x然后是flesch.kincaid应用于Ovid的向量.
x is then the vector of flesch.kincaid applied to Ovid.
> x[[1]]
Flesch-Kincaid Grade Level
Parameters: default
Grade: 13.62
Age: 18.62
Text language: en
我怎样才能只获得返回值Grade = 13.62和age = 18.62? str(x)太大,难以解析,即:
How can I get just the return values grade=13.62, and age=18.62? The str(x) is so large it's hard to parse, ie:
> str(x[[1]])
Formal class 'kRp.readability' [package "koRpus"] with 49 slots
..@ hyphen :Formal class 'kRp.hyphen' [package "koRpus"] with 3 slots
.. .. ..@ lang : chr "en"
.. .. ..@ desc :List of 5
.. .. .. ..$ num.syll : num 196
.. .. .. ..$ syll.distrib : num [1:6, 1:4] 25 25 65 27.8 27.8 ...
.. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
.. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
.. .. .. ..$ syll.uniq.distrib: num [1:6, 1:4] 15 15 61 19.7 19.7 ...
.. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
.. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
.. .. .. ..$ avg.syll.word : num 2.18
.. .. .. ..$ syll.per100 : num 218
.. .. ..@ hyphen:'data.frame': 90 obs. of 2 variables:
.. .. .. ..$ syll: num [1:90] 1 1 1 1 2 3 1 2 3 1 ...
.. .. .. ..$ word: chr [1:90] "Si" "quis" "in" "hoc" ...
..@ param :List of 1
.. ..$ Flesch.Kincaid: Named num [1:3] 0.39 11.8 15.59
.. .. ..- attr(*, "names")= chr [1:3] "asl" "asw" "const"
..@ ARI :List of 1
.. ..$ : logi NA
..@ ARI.NRI :List of 1
.. ..$ : logi NA
..@ ARI.simple :List of 1
.. ..$ : logi NA
..@ Bormuth :List of 1
.. ..$ : logi NA
..@ Coleman :List of 1
.. ..$ : logi NA
..@ Coleman.Liau :List of 1
.. ..$ : logi NA
..@ Dale.Chall :List of 1
.. ..$ : logi NA
..@ Dale.Chall.PSK :List of 1
.. ..$ : logi NA
..@ Dale.Chall.old :List of 1
.. ..$ : logi NA
..@ Danielson.Bryan :List of 1
.. ..$ : logi NA
..@ Dickes.Steiwer :List of 1
.. ..$ : logi NA
..@ DRP :List of 1
.. ..$ : logi NA
..@ ELF :List of 1
.. ..$ : logi NA
..@ Flesch :List of 1
.. ..$ : logi NA
..@ Flesch.PSK :List of 1
.. ..$ : logi NA
..@ Flesch.de :List of 1
.. ..$ : logi NA
..@ Flesch.es :List of 1
.. ..$ : logi NA
..@ Flesch.fr :List of 1
.. ..$ : logi NA
..@ Flesch.nl :List of 1
.. ..$ : logi NA
..@ Flesch.Kincaid :List of 3
.. ..$ flavour: chr "default"
.. ..$ grade : num 13.6
.. ..$ age : num 18.6
..@ Farr.Jenkins.Paterson :List of 1
.. ..$ : logi NA
..@ Farr.Jenkins.Paterson.PSK:List of 1
.. ..$ : logi NA
..@ FOG :List of 1
.. ..$ : logi NA
..@ FOG.PSK :List of 1
.. ..$ : logi NA
..@ FOG.NRI :List of 1
.. ..$ : logi NA
..@ FORCAST :List of 1
.. ..$ : logi NA
..@ FORCAST.RGL :List of 1
.. ..$ : logi NA
..@ Fucks :List of 1
.. ..$ : logi NA
..@ Harris.Jacobson :List of 1
.. ..$ : logi NA
..@ Linsear.Write :List of 1
.. ..$ : logi NA
..@ LIX :List of 1
.. ..$ : logi NA
..@ RIX :List of 1
.. ..$ : logi NA
..@ SMOG :List of 1
.. ..$ : logi NA
..@ SMOG.de :List of 1
.. ..$ : logi NA
..@ SMOG.C :List of 1
.. ..$ : logi NA
..@ SMOG.simple :List of 1
.. ..$ : logi NA
..@ Spache :List of 1
.. ..$ : logi NA
..@ Spache.old :List of 1
.. ..$ : logi NA
..@ Strain :List of 1
.. ..$ : logi NA
..@ Traenkle.Bailer :List of 1
.. ..$ : logi NA
..@ TRI :List of 1
.. ..$ : logi NA
..@ Wheeler.Smith :List of 1
.. ..$ : logi NA
..@ Wheeler.Smith.de :List of 1
.. ..$ : logi NA
..@ Wiener.STF :List of 1
.. ..$ : logi NA
..@ lang : chr "en"
..@ desc :List of 26
.. ..$ sentences : int 10
.. ..$ words : int 90
.. ..$ letters : Named num [1:12] 492 0 8 9 14 18 14 9 10 6 ...
.. .. ..- attr(*, "names")= chr [1:12] "all" "l1" "l2" "l3" ...
.. ..$ all.chars : int 692
.. ..$ syllables : Named num [1:5] 196 25 32 25 8
.. .. ..- attr(*, "names")= chr [1:5] "all" "s1" "s2" "s3" ...
.. ..$ lttr.distrib : num [1:6, 1:11] 0 0 90 0 0 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
.. .. .. ..$ : chr [1:11] "1" "2" "3" "4" ...
.. ..$ syll.distrib : num [1:6, 1:4] 25 25 65 27.8 27.8 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
.. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
.. ..$ syll.uniq.distrib : num [1:6, 1:4] 15 15 61 19.7 19.7 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
.. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
.. ..$ punct : int 17
.. ..$ conjunctions : int 0
.. ..$ prepositions : int 0
.. ..$ pronouns : int 0
.. ..$ foreign : int 0
.. ..$ TTR : num 0.844
.. ..$ avg.sentc.length : num 9
.. ..$ avg.word.length : num 5.47
.. ..$ avg.syll.word : num 2.18
.. ..$ sntc.per.word : num 0.111
.. ..$ sntc.per100 : num 11.1
.. ..$ lett.per100 : num 547
.. ..$ syll.per100 : num 218
.. ..$ FOG.hard.words : NULL
.. ..$ Bormuth.NOL : NULL
.. ..$ Dale.Chall.NOL : NULL
.. ..$ Harris.Jacobson.NOL: NULL
.. ..$ Spache.NOL : NULL
..@ TT.res :'data.frame': 107 obs. of 6 variables:
.. ..$ token : chr [1:107] "Si" "quis" "in" "hoc" ...
.. ..$ tag : chr [1:107] "word.kRp" "word.kRp" "word.kRp" "word.kRp" ...
.. ..$ lemma : chr [1:107] "" "" "" "" ...
.. ..$ lttr : num [1:107] 2 4 2 3 5 6 3 5 6 1 ...
.. ..$ wclass: chr [1:107] "word" "word" "word" "word" ...
.. ..$ desc : chr [1:107] "Word (kRp internal)" "Word (kRp internal)" "Word (kRp internal)" "Word (kRp internal)" ...
理想情况下,我想将F-K分数分配给tm中的meta(d).
I'd ideally like to assign the F-K score to the meta(d) back in tm.
我很乐于学习如何理解该返回对象并获取其值,而且,如果还有另一种更好,更快的获取F-K分数的方法,我将不胜感激!
I'd appreciate learning either how to understand this return object and take out its values, but also, if there's another, better, faster way to get a F-K score, I'm all ears!
推荐答案
类似于@Paul的答案,但有一个线性解决方案
Similar to @Paul answer but one liner solution
sapply(lapply(x,slot,'Flesch.Kincaid'),'[',c('age','grade'))
[,1] [,2] [,3] [,4] [,5]
age 18.61778 17.62351 17.77699 18.29032 18.645
grade 13.61778 12.62351 12.77699 13.29032 13.645
这篇关于如何从R中的koRpus对象提取内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!