本文介绍了如何从R中的koRpus对象提取内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用tm软件包,并希望使用R获得文档的Flesch-Kincaid分数.我发现koRpus软件包具有很多指标,包括阅读水平,因此开始使用它.但是,返回的对象似乎是一个非常复杂的s4对象,我不知道该如何解析.

I'm using the tm package, and looking to get the Flesch-Kincaid scores for a document using R. I found the koRpus package has some a lot of metrics including reading-level, and started using that. However, the object returned seems to be a very complicated s4 object I don't understand how to parse.

因此,我将其应用于我的语料库:

So, I apply this to my corpus:

txt <- system.file("texts", "txt", package = "tm")
(d <- Corpus(DirSource(txt, encoding = "UTF-8"), readerControl = list(language = "lat")))

f <- function(x) tokenize(x, format="obj", lang='en')
g <- function(x) flesch.kincaid(x)
x <- foreach(i=1:5) %dopar% g(f(d[[i]]))

x然后是flesch.kincaid应用于Ovid的向量.

x is then the vector of flesch.kincaid applied to Ovid.

> x[[1]]

Flesch-Kincaid Grade Level
  Parameters: default
       Grade: 13.62
         Age: 18.62

Text language: en

我怎样才能只获得返回值Grade = 13.62和age = 18.62? str(x)太大,难以解析,即:

How can I get just the return values grade=13.62, and age=18.62? The str(x) is so large it's hard to parse, ie:

> str(x[[1]])
Formal class 'kRp.readability' [package "koRpus"] with 49 slots
  ..@ hyphen                   :Formal class 'kRp.hyphen' [package "koRpus"] with 3 slots
  .. .. ..@ lang  : chr "en"
  .. .. ..@ desc  :List of 5
  .. .. .. ..$ num.syll         : num 196
  .. .. .. ..$ syll.distrib     : num [1:6, 1:4] 25 25 65 27.8 27.8 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
  .. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
  .. .. .. ..$ syll.uniq.distrib: num [1:6, 1:4] 15 15 61 19.7 19.7 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
  .. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
  .. .. .. ..$ avg.syll.word    : num 2.18
  .. .. .. ..$ syll.per100      : num 218
  .. .. ..@ hyphen:'data.frame':    90 obs. of  2 variables:
  .. .. .. ..$ syll: num [1:90] 1 1 1 1 2 3 1 2 3 1 ...
  .. .. .. ..$ word: chr [1:90] "Si" "quis" "in" "hoc" ...
  ..@ param                    :List of 1
  .. ..$ Flesch.Kincaid: Named num [1:3] 0.39 11.8 15.59
  .. .. ..- attr(*, "names")= chr [1:3] "asl" "asw" "const"
  ..@ ARI                      :List of 1
  .. ..$ : logi NA
  ..@ ARI.NRI                  :List of 1
  .. ..$ : logi NA
  ..@ ARI.simple               :List of 1
  .. ..$ : logi NA
  ..@ Bormuth                  :List of 1
  .. ..$ : logi NA
  ..@ Coleman                  :List of 1
  .. ..$ : logi NA
  ..@ Coleman.Liau             :List of 1
  .. ..$ : logi NA
  ..@ Dale.Chall               :List of 1
  .. ..$ : logi NA
  ..@ Dale.Chall.PSK           :List of 1
  .. ..$ : logi NA
  ..@ Dale.Chall.old           :List of 1
  .. ..$ : logi NA
  ..@ Danielson.Bryan          :List of 1
  .. ..$ : logi NA
  ..@ Dickes.Steiwer           :List of 1
  .. ..$ : logi NA
  ..@ DRP                      :List of 1
  .. ..$ : logi NA
  ..@ ELF                      :List of 1
  .. ..$ : logi NA
  ..@ Flesch                   :List of 1
  .. ..$ : logi NA
  ..@ Flesch.PSK               :List of 1
  .. ..$ : logi NA
  ..@ Flesch.de                :List of 1
  .. ..$ : logi NA
  ..@ Flesch.es                :List of 1
  .. ..$ : logi NA
  ..@ Flesch.fr                :List of 1
  .. ..$ : logi NA
  ..@ Flesch.nl                :List of 1
  .. ..$ : logi NA
  ..@ Flesch.Kincaid           :List of 3
  .. ..$ flavour: chr "default"
  .. ..$ grade  : num 13.6
  .. ..$ age    : num 18.6
  ..@ Farr.Jenkins.Paterson    :List of 1
  .. ..$ : logi NA
  ..@ Farr.Jenkins.Paterson.PSK:List of 1
  .. ..$ : logi NA
  ..@ FOG                      :List of 1
  .. ..$ : logi NA
  ..@ FOG.PSK                  :List of 1
  .. ..$ : logi NA
  ..@ FOG.NRI                  :List of 1
  .. ..$ : logi NA
  ..@ FORCAST                  :List of 1
  .. ..$ : logi NA
  ..@ FORCAST.RGL              :List of 1
  .. ..$ : logi NA
  ..@ Fucks                    :List of 1
  .. ..$ : logi NA
  ..@ Harris.Jacobson          :List of 1
  .. ..$ : logi NA
  ..@ Linsear.Write            :List of 1
  .. ..$ : logi NA
  ..@ LIX                      :List of 1
  .. ..$ : logi NA
  ..@ RIX                      :List of 1
  .. ..$ : logi NA
  ..@ SMOG                     :List of 1
  .. ..$ : logi NA
  ..@ SMOG.de                  :List of 1
  .. ..$ : logi NA
  ..@ SMOG.C                   :List of 1
  .. ..$ : logi NA
  ..@ SMOG.simple              :List of 1
  .. ..$ : logi NA
  ..@ Spache                   :List of 1
  .. ..$ : logi NA
  ..@ Spache.old               :List of 1
  .. ..$ : logi NA
  ..@ Strain                   :List of 1
  .. ..$ : logi NA
  ..@ Traenkle.Bailer          :List of 1
  .. ..$ : logi NA
  ..@ TRI                      :List of 1
  .. ..$ : logi NA
  ..@ Wheeler.Smith            :List of 1
  .. ..$ : logi NA
  ..@ Wheeler.Smith.de         :List of 1
  .. ..$ : logi NA
  ..@ Wiener.STF               :List of 1
  .. ..$ : logi NA
  ..@ lang                     : chr "en"
  ..@ desc                     :List of 26
  .. ..$ sentences          : int 10
  .. ..$ words              : int 90
  .. ..$ letters            : Named num [1:12] 492 0 8 9 14 18 14 9 10 6 ...
  .. .. ..- attr(*, "names")= chr [1:12] "all" "l1" "l2" "l3" ...
  .. ..$ all.chars          : int 692
  .. ..$ syllables          : Named num [1:5] 196 25 32 25 8
  .. .. ..- attr(*, "names")= chr [1:5] "all" "s1" "s2" "s3" ...
  .. ..$ lttr.distrib       : num [1:6, 1:11] 0 0 90 0 0 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
  .. .. .. ..$ : chr [1:11] "1" "2" "3" "4" ...
  .. ..$ syll.distrib       : num [1:6, 1:4] 25 25 65 27.8 27.8 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
  .. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
  .. ..$ syll.uniq.distrib  : num [1:6, 1:4] 15 15 61 19.7 19.7 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ...
  .. .. .. ..$ : chr [1:4] "1" "2" "3" "4"
  .. ..$ punct              : int 17
  .. ..$ conjunctions       : int 0
  .. ..$ prepositions       : int 0
  .. ..$ pronouns           : int 0
  .. ..$ foreign            : int 0
  .. ..$ TTR                : num 0.844
  .. ..$ avg.sentc.length   : num 9
  .. ..$ avg.word.length    : num 5.47
  .. ..$ avg.syll.word      : num 2.18
  .. ..$ sntc.per.word      : num 0.111
  .. ..$ sntc.per100        : num 11.1
  .. ..$ lett.per100        : num 547
  .. ..$ syll.per100        : num 218
  .. ..$ FOG.hard.words     : NULL
  .. ..$ Bormuth.NOL        : NULL
  .. ..$ Dale.Chall.NOL     : NULL
  .. ..$ Harris.Jacobson.NOL: NULL
  .. ..$ Spache.NOL         : NULL
  ..@ TT.res                   :'data.frame':   107 obs. of  6 variables:
  .. ..$ token : chr [1:107] "Si" "quis" "in" "hoc" ...
  .. ..$ tag   : chr [1:107] "word.kRp" "word.kRp" "word.kRp" "word.kRp" ...
  .. ..$ lemma : chr [1:107] "" "" "" "" ...
  .. ..$ lttr  : num [1:107] 2 4 2 3 5 6 3 5 6 1 ...
  .. ..$ wclass: chr [1:107] "word" "word" "word" "word" ...
  .. ..$ desc  : chr [1:107] "Word (kRp internal)" "Word (kRp internal)" "Word (kRp internal)" "Word (kRp internal)" ...

理想情况下,我想将F-K分数分配给tm中的meta(d).

I'd ideally like to assign the F-K score to the meta(d) back in tm.

我很乐于学习如何理解该返回对象并获取其值,而且,如果还有另一种更好,更快的获取F-K分数的方法,我将不胜感激!

I'd appreciate learning either how to understand this return object and take out its values, but also, if there's another, better, faster way to get a F-K score, I'm all ears!

推荐答案

类似于@Paul的答案,但有一个线性解决方案

Similar to @Paul answer but one liner solution

   sapply(lapply(x,slot,'Flesch.Kincaid'),'[',c('age','grade'))
      [,1]     [,2]     [,3]     [,4]     [,5]
age   18.61778 17.62351 17.77699 18.29032 18.645
grade 13.61778 12.62351 12.77699 13.29032 13.645

这篇关于如何从R中的koRpus对象提取内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 10:38
查看更多