因此,我有很多这样的单词,并且根据第一个空格,我想将单词分为单词含义。基本上,我为此使用Apache POI
,因为我必须读取docx文件,然后从中获取数据。
abash humiliate, embarrass
abdicate relinquish power or position
aberrant abnormal
abet aid, encourage (typically of crime)
abeyance postponement
aboriginal indigenous
abridge shorten
abstemious moderate
...
所以什么正则表达式适合我的目的,以便我可以像这样显示它:
word :abash
meaning : humiliate, embarrass
...
我的代码是:
public class WordFileReader {
/**
* @param args
*/
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("E:\\important.docx");
org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
System.out.print(oleTextExtractor.getText());
} catch (Exception e) {
e.printStackTrace();
}
}
}
- 编辑 -
根据建议的答案,我正在使用
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("E:\\Words.docx");
org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
//System.out.print(oleTextExtractor.getText());
Scanner sc = new Scanner(oleTextExtractor.getText());
while(sc.hasNextLine()) {
String line = sc.nextLine();
int i = line.indexOf(' ');
String word = line.substring(0, i);
String meaning = line.substring(i).trim();
System.out.println("word "+word);
System.out.println("meaning "+meaning);
}
} catch (Exception e) {
e.printStackTrace();
}
}
但是我明白了
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(Unknown Source)
at WordFileReader.main(WordFileReader.java:25)
最佳答案
我将使用java.util.Scanner从文本中提取行
Scanner sc = new Scanner(oleTextExtractor.getText());
while(sc.hasNextLine()) {
String line = sc.nextLine();
...
然后我将线分为字和含义
int i = line.indexOf(' ', 2); // start from pos 2 to avoid a article
String word = txt.substring(0, i);
String meaning = txt.substring(i).trim();
要么
String[] parts = line.split("(?<!^a)\\s+", 2);
String word = parts[0];
String meaning = parts[1];
关于java - 将句子分为两个字符串并反复显示,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/17019776/