本文介绍了使用iTextPDF修剪页面的空白的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个pdf,其中包含一些数据,后跟一些空格。我不知道数据有多大,但我想在数据后删除空白

  PdfReader reader =新的PdfReader(PDFLOCATION); 
Rectangle rect = new Rectangle(700,2000);
Document document = new Document(rect);
PdfWriter writer = PdfWriter.getInstance(document,new FileOutputStream(SAVELCATION));

document.open();

int n = reader.getNumberOfPages();
PdfImportedPage页面;
for(int i = 1; i< = n; i ++){
document.newPage();
page = writer.getImportedPage(reader,i);
Image instance = Image.getInstance(page);
document.add(instance);
}
document.close();

有没有办法剪切/修剪新文档中每个页面的空白?
此PDF包含矢量图形。



我使用iTextPDF,但可以切换到任何Java库(mavenized,Apache许可首选)

解决方案

由于没有发布实际的解决方案,这里附带一些指针)



使用此类修剪空白结果





这几乎是人们所希望的。



注意:上面的实现远非最佳。它甚至不正确,因为它包括太多的所有曲线控制点。此外,它忽略了线宽或楔形类型之类的东西。它实际上只是一个概念验证。



所有测试代码都在。


I have a pdf which comprises of some data, followed by some whitespace. I don't know how large the data is, but I'd like to trim off the whitespace following the data

    PdfReader reader = new PdfReader(PDFLOCATION);
    Rectangle rect = new Rectangle(700, 2000);
    Document document = new Document(rect);
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(SAVELCATION));

     document.open();

        int n = reader.getNumberOfPages();
        PdfImportedPage page;
        for (int i = 1; i <= n; i++) {
            document.newPage();
            page = writer.getImportedPage(reader, i);
            Image instance = Image.getInstance(page);
            document.add(instance);
        }
        document.close();

Is there a way to clip/trim the whitespace for each page in the new document?This PDF contains vector graphics.

I'm usung iTextPDF, but can switch to any Java library (mavenized, Apache license preferred)

解决方案

As no actual solution has been posted, here some pointers from the accompanying itext-questions mailing list thread:

  1. As you want to merely trim pages, this is not a case of PdfWriter + getImportedPage usage but instead of PdfStamper usage. Your main code using a PdfStamper might look like this:

    PdfReader reader = new PdfReader(resourceStream);
    PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("target/test-outputs/test-trimmed-stamper.pdf"));
    
    // Go through all pages
    int n = reader.getNumberOfPages();
    for (int i = 1; i <= n; i++)
    {
        Rectangle pageSize = reader.getPageSize(i);
        Rectangle rect = getOutputPageSize(pageSize, reader, i);
    
        PdfDictionary page = reader.getPageN(i);
        page.put(PdfName.CROPBOX, new PdfArray(new float[]{rect.getLeft(), rect.getBottom(), rect.getRight(), rect.getTop()}));
        stamper.markUsed(page);
    }
    stamper.close();
    

    As you see I also added another argument to your getOutputPageSize method to-be. It is the page number. The amount of white space to trim might differ on different pages after all.

  2. If the source document did not contain vector graphics, you could simply use the iText parser package classes. There even already is a TextMarginFinder based on them. In this case the getOutputPageSize method (with the additional page parameter) could look like this:

    private Rectangle getOutputPageSize(Rectangle pageSize, PdfReader reader, int page) throws IOException
    {
        PdfReaderContentParser parser = new PdfReaderContentParser(reader);
        TextMarginFinder finder = parser.processContent(page, new TextMarginFinder());
        Rectangle result = new Rectangle(finder.getLlx(), finder.getLly(), finder.getUrx(), finder.getUry());
        System.out.printf("Text/bitmap boundary: %f,%f to %f, %f\n", finder.getLlx(), finder.getLly(), finder.getUrx(), finder.getUry());
        return result;
    }
    

    Using this method with your file test.pdf results in:

    As you see the code trims according to text (and bitmap image) content on the page.

  3. To find the bounding box respecting vector graphics, too, you essentially have to do the same but you have to extend the parser framework used here to inform its listeners (the TextMarginFinder essentially is a listener to drawing events sent from the parser framework) about vector graphics operations, too. This is non-trivial, especially if you don't know PDF syntax by heart yet.

  4. If your PDFs to trim are not too generic but can be forced to include some text or bitmap graphics in relevant positions, though, you could use the sample code above (probably with minor changes) anyways.

    E.g. if your PDFs always start with text on top and end with text at the bottom, you could change getOutputPageSize to create the result rectangle like this:

    Rectangle result = new Rectangle(pageSize.getLeft(), finder.getLly(), pageSize.getRight(), finder.getUry());
    

    This only trims top and bottom empty space:

    Depending on your input data pool and requirements this might suffice.

    Or you can use some other heuristics depending on your knowledge on the input data. If you know something about the positioning of text (e.g. the heading to always be centered and some other text to always start at the left), you can easily extend the TextMarginFinder to take advantage of this knowledge.


Recent (April 2015, iText 5.5.6-SNAPSHOT) improvements

The current development version, 5.5.6-SNAPSHOT, extends the parser package to also include vector graphics parsing. This allows for an extension of iText's original TextMarginFinder class implementing the new ExtRenderListener methods like this:

@Override
public void modifyPath(PathConstructionRenderInfo renderInfo)
{
    List<Vector> points = new ArrayList<Vector>();
    if (renderInfo.getOperation() == PathConstructionRenderInfo.RECT)
    {
        float x = renderInfo.getSegmentData().get(0);
        float y = renderInfo.getSegmentData().get(1);
        float w = renderInfo.getSegmentData().get(2);
        float h = renderInfo.getSegmentData().get(3);
        points.add(new Vector(x, y, 1));
        points.add(new Vector(x+w, y, 1));
        points.add(new Vector(x, y+h, 1));
        points.add(new Vector(x+w, y+h, 1));
    }
    else if (renderInfo.getSegmentData() != null)
    {
        for (int i = 0; i < renderInfo.getSegmentData().size()-1; i+=2)
        {
            points.add(new Vector(renderInfo.getSegmentData().get(i), renderInfo.getSegmentData().get(i+1), 1));
        }
    }

    for (Vector point: points)
    {
        point = point.cross(renderInfo.getCtm());
        Rectangle2D.Float pointRectangle = new Rectangle2D.Float(point.get(Vector.I1), point.get(Vector.I2), 0, 0);
        if (currentPathRectangle == null)
            currentPathRectangle = pointRectangle;
        else
            currentPathRectangle.add(pointRectangle);
    }
}

@Override
public Path renderPath(PathPaintingRenderInfo renderInfo)
{
    if (renderInfo.getOperation() != PathPaintingRenderInfo.NO_OP)
    {
        if (textRectangle == null)
            textRectangle = currentPathRectangle;
        else
            textRectangle.add(currentPathRectangle);
    }
    currentPathRectangle = null;

    return null;
}

@Override
public void clipPath(int rule)
{
}

(Full source: MarginFinder.java)

Using this class to trim the white space results in

which is pretty much what one would hope for.

Beware: The implementation above is far from optimal. It is not even correct as it includes all curve control points which is too much. Furthermore it ignores stuff like line width or wedge types. It actually merely is a proof-of-concept.

All test code is in TestTrimPdfPage.java.

这篇关于使用iTextPDF修剪页面的空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 16:34