本文介绍了PDFBox 2.0 RC3-查找和替换文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
一个人如何使用PDFBox 2.0在PDF文档中查找和替换文本,他们提出了一个旧示例,它的语法不再起作用,所以我想知道是否仍然可行,以及这样做的最佳方法是什么.谢谢!
How can one find and replace text inside a PDF document using PDFBox 2.0, they pulled the old example and it's syntax no longer works so I am wondering if it's still possible and if so what the best way to go about it is. Thanks!
推荐答案
您可以尝试以下操作:
public static PDDocument replaceText(PDDocument document, String searchString, String replacement) throws IOException {
if (Strings.isEmpty(searchString) || Strings.isEmpty(replacement)) {
return document;
}
PDPageTree pages = document.getDocumentCatalog().getPages();
for (PDPage page : pages) {
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
List tokens = parser.getTokens();
for (int j = 0; j < tokens.size(); j++) {
Object next = tokens.get(j);
if (next instanceof Operator) {
Operator op = (Operator) next;
//Tj and TJ are the two operators that display strings in a PDF
if (op.getName().equals("Tj")) {
// Tj takes one operator and that is the string to display so lets update that operator
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
string = string.replaceFirst(searchString, replacement);
previous.setValue(string.getBytes());
} else if (op.getName().equals("TJ")) {
COSArray previous = (COSArray) tokens.get(j - 1);
for (int k = 0; k < previous.size(); k++) {
Object arrElement = previous.getObject(k);
if (arrElement instanceof COSString) {
COSString cosString = (COSString) arrElement;
String string = cosString.getString();
string = StringUtils.replaceOnce(string, searchString, replacement);
cosString.setValue(string.getBytes());
}
}
}
}
}
// now that the tokens are updated we will replace the page content stream.
PDStream updatedStream = new PDStream(document);
OutputStream out = updatedStream.createOutputStream();
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens(tokens);
page.setContents(updatedStream);
out.close();
}
return document;
}
这篇关于PDFBox 2.0 RC3-查找和替换文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!