本文介绍了JAI:如何从多页TIFF图像容器中提取单页输入流?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个将PDF文档转换为图像的组件,每页一个图像。由于组件使用转换器生成内存中的映像,因此它会严重影响JVM堆并花费一些时间来完成转换。

I have a component that converts PDF documents to images, one image per page. Since the component uses converters producing in-memory images, it hits the JVM heap heavily and takes some time to finish conversions.

我正在尝试提高整体性能转换过程,并找到一个带有JNI绑定的本机库,将PDF转换为TIFF。该库只能将PDF转换为单个TIFF文件(需要中间文件系统存储;甚至不使用转换流),因此结果TIFF文件已嵌入转换页面,而不是文件系统上的每页图像。拥有一个本地库可以极大地改善整体转换,并且性能变得非常快,但是存在一个真正的瓶颈:因为我必须将源页面转换为目标页面转换,现在我必须从结果文件中提取每个页面并写入其他所有人。一个简单而天真的方法 RenderedImage s:

I'm trying to improve the overall performance of the conversion process, and found a native library with a JNI binding to convert PDFs to TIFFs. That library can convert PDFs to single TIFF files only (requires intermediate file system storage; does not even consume conversion streams), therefore result TIFF files have converted pages embedded, and not per-page images on the file system. Having a native library improves the overall conversion drastically and the performance gets really faster, but there is a real bottleneck: since I have to make a source-page to destination-page conversion, now I must extract every page from the result file and write all of them elsewhere. A simple and naive approach with RenderedImages:

final SeekableStream seekableStream = new FileSeekableStream(tempFile);
final ImageDecoder imageDecoder = createImageDecoder("tiff", seekableStream, null);
...
//                                               V--- heap is wasted here
final RenderedImage renderedImage = imageDecoder.decodeAsRenderedImage(pageNumber);
// ... do the rest stuff ...

实际上,我会说非常想从TIFF容器文件( tempFile )中提取具体的页面输入流,并将其重定向到其他地方,而不必将其存储为内存中的图像。我想象一种类似于容器处理的方法,我需要寻找一个特定的条目来从中提取数据(比如像ZIP文件处理等)。但是我在 ImageDecoder 中找不到类似的东西,或者我可能错了我的期望并错过了一些重要的东西......

Actually speaking, I would really like just to extract a concrete page input stream from the TIFF container file (tempFile) and just redirect it to elsewhere without having it to be stored as an in-memory image. I would imagine an approach similar to containers processing where I need to seek for a specific entry to extract data from it (say, something like ZIP files processing, etc). But I couldn't find anything like that in ImageDecoder, or I'm probably wrong with my expectations and just missing something important here...

是否可以使用JAI API或第三方备选方案提取TIFF容器页面输入流?在此先感谢。

Is it possible to extract TIFF container page input streams using JAI API or probably third-party alternatives? Thanks in advance.

推荐答案

我可能错了,但不要认为JAI支持拆分TIFF而不解码文件内存中的图像。并且,抱歉推销我自己的库,但我认为它完全符合您的需求(用于拆分TIFF的解决方案的主要部分由第三方提供)。

I could be wrong, but don't think JAI has support for splitting TIFFs without decoding the files to in-memory images. And, sorry for promoting my own library, but I think it does exactly what you need (the main part of the solution used to split TIFFs is contributed by a third party).

使用类,您应该能够拆分您的多页TIFF到多个单页TIFF,如下所示:

By using the TIFFUtilities class from com.twelvemonkeys.contrib.tiff, you should be able to split your multi-page TIFF to multiple single-page TIFFs like this:

TIFFUtilities.split(tempFile, new File("output"));

不完成图像解码,只将每个IFD拆分成单独的文件,然后写入流具有更正的偏移量和字节数。

No decoding of the images are done, only splitting each IFD into a separate file, and writing the streams with corrected offsets and byte counts.

文件将命名为 output / 0001.tif output / 0002.tif 等。如果您需要更多地控制输出名称或有其他要求,您可以轻松修改代码。该代码附带BSD风格的许可证。

Files will be named output/0001.tif, output/0002.tif etc. If you need more control over the output name or have other requirements, you can easily modify the code. The code comes with a BSD-style license.

这篇关于JAI:如何从多页TIFF图像容器中提取单页输入流?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 21:43