问题描述
我需要将类 Sentence
解析为单词和标点符号(空格被视为标点符号),然后将所有内容添加到常规 ArrayList< ;句子>
。
I need to parse class Sentence
into word and punctuation (whitespace is considered as a punctuation mark), then add all of it into general ArrayList<Sentence>
.
例句:
我试着一次一个字符地阅读整个句子并收集相同的内容,并从这个集合中创建新单词或新的标点符号
。
I tried to read this whole sentence one character at a time and collect the same and create new word or new Punctuation
from this collection.
这是我的代码:
public class Sentence {
private String sentence;
private LinkedList<SentenceElement> elements;
/**
* Constructs a sentence.
* @param aText a string containing all characters of the sentence
*/
public Sentence(String aText) {
sentence = aText.trim();
splitSentence();
}
public String getSentence() {
return sentence;
}
public LinkedList<SentenceElement> getElements() {
return elements;
}
/**
* Split sentance into words and punctuations
*/
private void splitSentence() {
if (sentence == "" || sentence == null || sentence == "\n") {
return;
}
StringBuilder builder = new StringBuilder();
int j = 0;
boolean mark = false;
while (j < sentence.length()) {
//char current = sentence.charAt(j);
while (Character.isLetter(sentence.charAt(j))) {
if (mark) {
elements.add(new Punctuation(builder.toString()));
builder.setLength(0);
mark = false;
}
builder.append(sentence.charAt(j));
j++;
}
mark = true;
while (!Character.isLetter(sentence.charAt(j))) {
if (mark) {
elements.add(new Word(builder.toString()));
builder.setLength(0);
mark = false;
}
builder.append(sentence.charAt(j));
j++;
}
mark = true;
}
}
但splitSentence()的逻辑无法正常工作。我无法找到合适的解决方案。
But logic of splitSentence() isn't work correctly. And I can't to find right solution for it.
我想实现这个,因为我们读取第一个字符=>添加到builder =>直到下一个元素是相同的类型(字母或标点符号)继续添加到builder =>当下一个元素不同于builder =>创建新单词或标点符号并将构建器设置为开始时。
I want to implement this as we read first character => add to builder => till next element are the same type (letter or punctuation) keep adding to builder => when next element are different than content of builder => create new word or punctuation and set builder to start.
再次执行相同的逻辑。
如何以正确的方式实现此检查逻辑?
How to implement this checking logic at right way?
推荐答案
在字边界上拆分字符串(第一个除外):
Split the string on word boundaries (except the first):
String[] parts = sentence.split("(?<!^)\\b");
数组将包含交替的单词/标点符号/单词/标点符号/单词等。
The array will contain alternating word/punctuation/word/punctuation/word etc.
以下是一些测试代码:
String sentence = "A man, a plan, a canal — Panama!";
String[] parts = sentence.split("(?<!^)\\b");
for (String part : parts)
System.out.println('"' + part + "\" (" + (part.matches("\\w+") ? "word" : "punctuation") + ")");
输出:
"A" (word)
" " (punctuation)
"man" (word)
", " (punctuation)
"a" (word)
" " (punctuation)
"plan" (word)
", " (punctuation)
"a" (word)
" " (punctuation)
"canal" (word)
" — " (punctuation)
"Panama" (word)
"!" (punctuation)
这篇关于将句子分成单词和标点符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!