问题描述
我需要将 Sentence
类解析为单词和标点符号(空格被视为标点符号),然后将其全部添加到通用 ArrayList
中.
I need to parse class Sentence
into word and punctuation (whitespace is considered as a punctuation mark), then add all of it into general ArrayList<Sentence>
.
例句:
一个人、一个计划、一条运河——巴拿马!
A => 字
whitespase => 标点符号
人 => 字
, + 空格 => 标点符号
一个 => 字
[...]
我尝试一次一个字符地阅读整个句子,然后收集相同的内容并从该集合中创建新单词或新的标点
.
I tried to read this whole sentence one character at a time and collect the same and create new word or new Punctuation
from this collection.
这是我的代码:
public class Sentence {
private String sentence;
private LinkedList<SentenceElement> elements;
/**
* Constructs a sentence.
* @param aText a string containing all characters of the sentence
*/
public Sentence(String aText) {
sentence = aText.trim();
splitSentence();
}
public String getSentence() {
return sentence;
}
public LinkedList<SentenceElement> getElements() {
return elements;
}
/**
* Split sentance into words and punctuations
*/
private void splitSentence() {
if (sentence == "" || sentence == null || sentence == "
") {
return;
}
StringBuilder builder = new StringBuilder();
int j = 0;
boolean mark = false;
while (j < sentence.length()) {
//char current = sentence.charAt(j);
while (Character.isLetter(sentence.charAt(j))) {
if (mark) {
elements.add(new Punctuation(builder.toString()));
builder.setLength(0);
mark = false;
}
builder.append(sentence.charAt(j));
j++;
}
mark = true;
while (!Character.isLetter(sentence.charAt(j))) {
if (mark) {
elements.add(new Word(builder.toString()));
builder.setLength(0);
mark = false;
}
builder.append(sentence.charAt(j));
j++;
}
mark = true;
}
}
但是 splitSentence() 的逻辑不能正常工作.而且我找不到正确的解决方案.
But logic of splitSentence() isn't work correctly. And I can't to find right solution for it.
我想在我们读取第一个字符时实现这个 => 添加到构建器 => 直到下一个元素是相同类型(字母或标点符号) 继续添加到构建器 => 当下一个元素与构建器的内容不同时 => 创建新单词或标点符号并设置生成器开始.
I want to implement this as we read first character => add to builder => till next element are the same type (letter or punctuation) keep adding to builder => when next element are different than content of builder => create new word or punctuation and set builder to start.
再次执行相同的逻辑.
如何以正确的方式实现这种检查逻辑?
推荐答案
在单词边界上拆分字符串(第一个除外):
Split the string on word boundaries (except the first):
String[] parts = sentence.split("(?<!^)\b");
数组将包含交替的单词/标点符号/单词/标点符号/单词等
The array will contain alternating word/punctuation/word/punctuation/word etc.
这是一些测试代码:
String sentence = "A man, a plan, a canal — Panama!";
String[] parts = sentence.split("(?<!^)\b");
for (String part : parts)
System.out.println('"' + part + "" (" + (part.matches("\w+") ? "word" : "punctuation") + ")");
输出:
"A" (word)
" " (punctuation)
"man" (word)
", " (punctuation)
"a" (word)
" " (punctuation)
"plan" (word)
", " (punctuation)
"a" (word)
" " (punctuation)
"canal" (word)
" — " (punctuation)
"Panama" (word)
"!" (punctuation)
这篇关于把句子分成单词和标点符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!