如何计算R中文本中的句子数

如何计算R中文本中的句子数

本文介绍了如何计算R中文本中的句子数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 readChar() 函数将文本读入 R.我的目标是检验文本句子中字母a"出现次数与字母b"出现次数一样多的假设.我最近发现了 {stringr} 包,它帮助我对文本做很多有用的事情,例如计算整个文本中每个字母的字符数和出现总数.现在,我需要知道整个文本中的句子数.R 有什么功能可以帮助我做到这一点吗?非常感谢!

I read a text into R using the readChar() function. I aim at testing the hypothesis that the sentences of the text have as many occurrences of letter "a" as occurrences of letter "b". I recently discovered the {stringr} package, which helped me a great deal to do useful things with my text such as counting the number of characters and the total number of occurrences of each letter in the entire text. Now, I need to know the number of sentences in the whole text. Does R have any function, which can help me do that? Thank you very much!

推荐答案

感谢@gui11aume 的回答.我刚刚发现可以帮助完成这项工作的一个非常好的软件包是 {openNLP}.这是执行此操作的代码:

Thank you @gui11aume for your answer. A very good package I just found that can help do the work is {openNLP}. This is the code to do that:

install.packages("openNLP") ## Installs the required natural language processing (NLP) package
install.packages("openNLPmodels.en") ## Installs the model files for the English language
library(openNLP) ## Loads the package for use in the task
library(openNLPmodels.en) ## Loads the model files for the English language

text = "Dr. Brown and Mrs. Theresa will be away from a very long time!!! I can't wait to see them again." ## This sentence has unusual punctuation as suggested by @gui11aume

x = sentDetect(text, language = "en") ## sentDetect() is the function to use. It detects and seperates sentences in a text. The first argument is the string vector (or text) and the second argument is the language.
x ## Displays the different sentences in the string vector (or text).

[1] "Dr. Brown and Mrs. Theresa will be away from a very long time!!! "
[2] "I can't wait to see them again."

length(x) ## Displays the number of sentences in the string vector (or text).

[1] 2

{openNLP} 包非常适合 R 中的自然语言处理,您可以找到它的简短介绍或可以检查出该包的文档.

The {openNLP} package is really great for natural language processing in R and you can find a good and short intro to it here or you can check out the package's documentation here.

包中支持另外三种语言.您只需要安装并加载相应的模型文件即可.

Three more languages are supported in the package. You just need to install and load the corresponding model files.

  1. {openNLPmodels.es} 西班牙语
  2. {openNLPmodels.ge} 德语版
  3. {openNLPmodels.th} 泰语
  1. {openNLPmodels.es} for Spanish
  2. {openNLPmodels.ge} for German
  3. {openNLPmodels.th} for Thai

这篇关于如何计算R中文本中的句子数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 00:50