java - 使用PDFBox读取PDF文件的前N个字符

我编写了以下函数，该函数使用PDFBox工具突出显示PDF中的文本：private String readFirstNChars(int N) { // N has not been used PDFTextStripper pdfTextStripper = null; PDDocument pdDocument = null; COSDocument cosDocument = null; File currentFile = this.pdfFile; try { PDFParser parser = new PDFParser(new RandomAccessBufferedFileInputStream(currentFile)); parser.parse(); cosDocument = parser.getDocument(); pdfTextStripper = new PDFTextStripper(); pdDocument = new PDDocument(cosDocument); pdfTextStripper.setStartPage(1); pdfTextStripper.setEndPage(1); String parsedText = pdfTextStripper.getText(pdDocument); return parsedText; } catch (IOException e) { e.printStackTrace(); return null; }}我当时想打印N的第一个parsedText字符，但是我想知道我可以读取的文件是否很大，这种方法没有任何意义，即将整个文本加载到内存中然后再得到个字符。有没有办法我只能从PDF中读取N字符？ (adsbygoogle = window.adsbygoogle || []).push({}); 最佳答案您可能需要PDFParser的源代码，以便可以编写适当的方法或编写自己的方法。 PDF不仅仅是可读的文本，因此从本质上讲，您需要解析文档，丢弃不可读的文本，然后对找到的实际文本进行计数。 (adsbygoogle = window.adsbygoogle || []).push({});