我有以下文字:Invoice n.ro per 006390 BENETTON RUSSIA OOO 2019 0051035408

我需要检查文本是否包含Invoice2019(4位数字),在这4位数字之后还有另一个n位数字,因此我想读取Invoice名称并跳过第一行然后得到第二行元素是这样的:


    File file = new File(this.fileName); // creating file object with String path
        final Pattern invoice = Pattern.compile("^Invoice n ([0-9])+$"); // using reg expression to match what we looking for

            PDDocument pdDocument = PDDocument.load(file); // creating PDD object and loading file that already got path
            Splitter splitter = new Splitter(); // splitter that takes care of splitting pages
            PDFTextStripper stripper = new PDFTextStripper(); // stripper strips text and ignore all formatting
            Matcher matcher;
            String resultInvoiceNumber = "";

            List<PDDocument> split = splitter.split(pdDocument); // split method splits into pages;

            for (PDDocument pd : split) { // looping through the list of split pages
                String s = stripper.getText(pd); //  getting text from single page  and assign it to a String for further manipulation

最佳答案

您可以根据组尝试以下操作:

public class RegexpTest {

    public static void main(String[] args) {
        final String input = "Invoice n.ro per 006390 BENETTON RUSSIA OOO 2019 0051035408";
        final Pattern pattern = Pattern.compile("(Invoice)*(\\s*\\d{4}\\s+\\d+\\s*)");

        final Matcher matcher = pattern.matcher(input);
        System.out.println(matcher.find());
        System.out.println(matcher.group());
    }
}


输出:

true
 2019 0051035408

关于java - 用正则表达式匹配3个字符串,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59209545/

10-13 01:16