因此,我有很多这样的单词,并且根据第一个空格,我想将单词分为单词含义。基本上,我为此使用Apache POI,因为我必须读取docx文件,然后从中获取数据。

    abash  humiliate, embarrass
    abdicate  relinquish power or position
    aberrant  abnormal
    abet  aid, encourage (typically of crime)
    abeyance  postponement
    aboriginal  indigenous
    abridge  shorten
    abstemious  moderate
...


所以什么正则表达式适合我的目的,以便我可以像这样显示它:

word :abash
meaning : humiliate, embarrass
...


我的代码是:

public class WordFileReader {

    /**
     * @param args
     */
    public static void main(String[] args) {
         try {
                FileInputStream fis = new FileInputStream("E:\\important.docx");
                org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
                System.out.print(oleTextExtractor.getText());
            } catch (Exception e) {
                    e.printStackTrace();
            }

    }

}


- 编辑 -
根据建议的答案,我正在使用

public static void main(String[] args) {
         try {
                FileInputStream fis = new FileInputStream("E:\\Words.docx");
                org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
                //System.out.print(oleTextExtractor.getText());

                Scanner sc = new Scanner(oleTextExtractor.getText());
                while(sc.hasNextLine()) {
                 String line = sc.nextLine();
                 int i = line.indexOf(' ');
                 String word = line.substring(0, i);
                 String meaning = line.substring(i).trim();

                 System.out.println("word "+word);
                 System.out.println("meaning "+meaning);
                }

            } catch (Exception e) {
                    e.printStackTrace();
            }

    }


但是我明白了

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.substring(Unknown Source)
    at WordFileReader.main(WordFileReader.java:25)

最佳答案

我将使用java.util.Scanner从文本中提取行

Scanner sc = new Scanner(oleTextExtractor.getText());
while(sc.hasNextLine()) {
    String line = sc.nextLine();
    ...


然后我将线分为字和含义

 int i = line.indexOf(' ', 2);  // start from pos 2 to avoid a article
 String word = txt.substring(0, i);
 String meaning = txt.substring(i).trim();


要么

 String[] parts = line.split("(?<!^a)\\s+", 2);
 String word = parts[0];
 String meaning = parts[1];

关于java - 将句子分为两个字符串并反复显示,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/17019776/

10-11 18:06