java - 如何使用正则表达式和Java计算文本中的音节

我将文本作为String，需要计算每个单词中的音节数。我试图将所有文本拆分成单词数组，然后分别处理每个单词。我为此使用了正则表达式。但是音节的模式无法正常工作。请建议如何更改它以计算正确的音节数。我的初始代码。

public int getNumSyllables()
{
    String[] words = getText().toLowerCase().split("[a-zA-Z]+");
    int count=0;
    List <String> tokens = new ArrayList<String>();
    for(String word: words){
            tokens = Arrays.asList(word.split("[bcdfghjklmnpqrstvwxyz]*[aeiou]+[bcdfghjklmnpqrstvwxyz]*"));
            count+= tokens.size();

            }
    return count;
}

最佳答案

这个问题来自UCSD的Java类(class)，对吗？

我认为您应该为这个问题提供足够的信息，以免混淆想要提供帮助的人们。在这里，我有自己的解决方案，该解决方案已经通过本地程序的测试用例以及UCSD的OJ进行了测试。

您错过了有关此问题中音节定义的一些重要信息。 实际上，我认为此问题的关键在于如何处理e。 例如，假设存在te的组合。而且，如果您将te放在一个单词的中间，则当然应该将其视为一个音节。但是，如果在单词的结尾，则e应该被视为英语中的silent e，因此不应将其视为音节。

就是这样。我想用一些伪代码写下我的想法:

  if(last character is e) {
        if(it is silent e at the end of this word) {
           remove the  silent e;
           count the rest part as regular;
        } else {
           count++;
  } else {
        count it as regular;
  }
}

您可能会发现我不仅在使用正则表达式来处理此问题。其实我已经考虑过了:是否可以仅使用正则表达式来解决这个问题？我的回答是:不，我不这么认为。至少现在，凭借UCSD所提供的知识，很难做到这一点。正则表达式是一个功能强大的工具，它可以非常快速地映射所需的字符。但是正则表达式缺少某些功能。再次以te为例，正则表达式在面对teate之类的单词时就不会三思而后行(例如，我是用这个词组成的)。如果我们的正则表达式模式会将第一个te视为音节，那么为什么最后一个te却不呢？

同时，UCSD实际上在作业纸上谈到了它:

这里的提示是，您应该将此问题与某些循环一起考虑，并与正则表达式结合使用。

好吧，我现在终于应该显示我的代码了:

protected int countSyllables(String word)
{
    // TODO: Implement this method so that you can call it from the
    // getNumSyllables method in BasicDocument (module 1) and
    // EfficientDocument (module 2).
    int count = 0;
    word = word.toLowerCase();

    if (word.charAt(word.length()-1) == 'e') {
        if (silente(word)){
            String newword = word.substring(0, word.length()-1);
            count = count + countit(newword);
        } else {
            count++;
        }
    } else {
        count = count + countit(word);
    }
    return count;
}

private int countit(String word) {
    int count = 0;
    Pattern splitter = Pattern.compile("[^aeiouy]*[aeiouy]+");
    Matcher m = splitter.matcher(word);

    while (m.find()) {
        count++;
    }
    return count;
}

private boolean silente(String word) {
    word = word.substring(0, word.length()-1);

    Pattern yup = Pattern.compile("[aeiouy]");
    Matcher m = yup.matcher(word);

    if (m.find()) {
        return true;
    } else
        return false;
}

您可能会发现，除了从给定的countSyllables方法中，我还创建了两个附加方法countit和silente。 countit用于计数单词内的音节，silente试图弄清该单词以无声e结尾。并且还应该注意not silent e的定义。例如，the应该考虑为not silent e，而ate应该考虑为silent e。

这是我的代码通过本地测试用例和UCSD的OJ都已经通过测试的状态:

并从OJ的测试结果:

附注:直接使用[^ aeiouy]之类的字词应该没问题，因为在调用此方法之前已对该单词进行了解析。另外，更改为小写字母也是必要的，这样可以节省很多处理大写字母的工作。我们想要的只是音节的数量。
谈论数字，一种优雅的方法是将count定义为静态，因此私有(private)方法可以直接在内部使用count++。但是现在很好。

如果您仍然不知道此问题的方法，请随时与我联系:)

关于java - 如何使用正则表达式和Java计算文本中的音节，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/33425070/