问题描述
我们可以创建一个新的自定义PDFOperator(例如PDFOperator {BDC})和COSBase对象(例如COSName {P} COSName {Prop1}(再次,Prop1将引用一个obj))吗?并将它们添加到pdf的根结构中吗?
Can we Create a new custom PDFOperator (like PDFOperator{BDC}) and COSBase objects(like COSName{P} COSName{Prop1} (again Prop1 will reference one more obj)) ? And add these to the root structure of a pdf?
我已经从现有的pdf文档中读取了一些解析器标记列表。我想标记pdf。在该过程中,我将首先使用新创建的COSBase对象操作令牌列表。最后,我将它们添加到根树结构中。因此,在这里如何创建COSBase对象。我正在使用从pdf提取令牌的代码是
I have read some list of parser tokens from an existing pdf document. I wanted to tag the pdf. In that process I will first manipulate the list of tokens with newly created COSBase objects. At last I will add them to root tree structure. So here how can I create a COSBase objects. I am using the code to extract tokens from pdf is
old_document = PDDocument.load(new File(inputPdfFile));
List<Object> newTokens = new ArrayList<>();
for (PDPage page : old_document.getPages())
{
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
List<Object> tokens = parser.getTokens();
for (Object token : tokens) {
System.out.println(token);
if (token instanceof Operator) {
Operator op = (Operator) token;
}
}
newTokens.add(token);
}
PDStream newContents = new PDStream(document);
document.addPage(page);
OutputStream out = newContents.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter writer = new ContentStreamWriter(out);
writer.writeTokens(newTokens);
out.close();
page.setContents(newContents);
document.save(outputPdfFile);
document.close();
以上代码将创建一个具有所有格式和图像的新pdf文件。
因此,在newTokens列表中包含所有现有的COSBase对象,因此我想使用一些标记COSBase对象进行操作,如果我保存了新文档,则应该对其进行标记而无需进行任何解码,编码,字体和图像处理。
Above code will create a new pdf with all formats and images.So In newTokens list contains all existing COSBase objects so I wanted to manipulate with some tagging COSBase objects and if I saved the new document then it should be tagged without taking care of any decode, encode, fonts and image handlings.
首先这个想法可行吗?如果是,那么请帮助我编写一些代码来创建自定义COSBase对象。我对Java非常陌生。
First Is this idea will work? If yes then help me with some code to create custom COSBase objects. I am very new to java.
推荐答案
根据您的文档格式,您可以插入标记的内容。
Based on your document format you can insert marked content.
//Below code is to add "/p <<MCID 0>> /BDC"
newTokens.add(COSName.getPDFName("P"));
currentMarkedContentDictionary = new COSDictionary();
currentMarkedContentDictionary.setInt(COSName.MCID, mcid);
mcid++;
newTokens.add(currentMarkedContentDictionary);
newTokens.add(Operator.getOperator("BDC"));
// After adding mcid you have to append your existing tokens TJ , TD, Td, T* ....
newTokens.add(existing_token);
// Closed EMC
newTokens.add(Operator.getOperator("EMC"));
//Adding marked content to the root tree structure.
structureElement = new PDStructureElement(StandardStructureTypes.P, currentSection);
structureElement.setPage(page);
PDMarkedContent markedContent = new PDMarkedContent(COSName.P, currentMarkedContentDictionary);
structureElement.appendKid(markedContent);
currentSection.appendKid(structureElement);
感谢@Tilman Hausherr
Thanks to @Tilman Hausherr
这篇关于使用pdfbox创建新的自定义COSBase对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!