我正在使用ruby计算我拥有的某些内容的Gunning Fog Index,我可以成功实现此处描述的算法:

Gunning Fog Index

我正在使用以下方法来计算每个单词中的音节数量:

Tokenizer = /([aeiouy]{1,3})/

def count_syllables(word)

  len = 0

  if word[-3..-1] == 'ing' then
    len += 1
    word = word[0...-3]
  end

  got = word.scan(Tokenizer)
  len += got.size()

  if got.size() > 1 and got[-1] == ['e'] and
      word[-1].chr() == 'e' and
      word[-2].chr() != 'l' then
    len -= 1
  end

  return len

end

有时它会选择只有2个音节的单词作为3个音节。任何人都可以提供任何建议或知道更好的方法吗?
text = "The word logorrhoea is often used pejoratively to describe prose that is highly abstract and contains little concrete language. Since abstract writing is hard to visualize, it often seems as though it makes no sense and all the words are excessive. Writers in academic fields that concern themselves mostly with the abstract, such as philosophy and especially postmodernism, often fail to include extensive concrete examples of their ideas, and so a superficial examination of their work might lead one to believe that it is all nonsense."

# used to get rid of any puncuation
text = text.gsub!(/\W+/, ' ')

word_array = text.split(' ')

word_array.each do |word|
    puts word if count_syllables(word) > 2
end

“自己”被算为3,但只有2

最佳答案

我之前给您的功能基于here概述的这些简单规则:



这是代码:

def new_count(word)
  word.downcase!
  return 1 if word.length <= 3
  word.sub!(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '')
  word.sub!(/^y/, '')
  word.scan(/[aeiouy]{1,2}/).size
end

显然,这也不是完美的,但是您将获得的所有类似启发式的东西。

编辑:

我稍稍更改了代码以处理前导“y”,并修复了正则表达式以更好地处理“les”结尾(例如,在“candles”中)。

这是使用问题中的文本进行的比较:
# used to get rid of any puncuation
text = text.gsub!(/\W+/, ' ')

words = text.split(' ')

words.each do |word|
  old = count_syllables(word.dup)
  new = new_count(word.dup)
  puts "#{word}: \t#{old}\t#{new}" if old != new
end

输出为:
logorrhoea:     3   4
used:   2   1
makes:  2   1
themselves:     3   2

因此,这似乎是一种改进。

关于Ruby,Count音节,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/1271918/

10-10 22:44