本文介绍了用Python构建高分辨率图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有大量数据存储在HDF5数据文件中(大小:20k x 20k,如果不是更多),我想使用Python从所有这些数据创建一个图像。显然,这些大量数据无法在没有错误的情况下打开并存储在内存中。因此,是否有一些其他库或方法不需要将所有数据转储到内存中然后处理成图像(如库:image,matplotlib,numpy等处理它)?

Say I have some huge amount of data stored in an HDF5 data file (size: 20k x 20k, if not more) and I want to create an image from all of this data using Python. Obviously, this much data cannot be opened and stored in the memory without an error. Therefore, is there some other library or method that would not require all of the data to be dumped into the memory and then processed into an image (like how the libraries: Image, matplotlib, numpy, etc. handle it)?

谢谢。

这个问题来自我提出的类似问题:但我认为我在这里提出的问题涵盖了更广泛的申请。

This question comes from a similar question I asked: Generating pcolormesh images from very large data sets saved in H5 files with Python But I think that the question I posed here covers a broader range of applications.

编辑(7.6.2013)

请允许我进一步澄清我的问题:在第一个问题(链接)中,我使用了我能想到的最简单的方法,从存储在多个文件中的大量数据中生成图像。此方法用于导入数据,使用matplotlib生成pcolormesh图,然后从此图中保存高分辨率图像。但是这种方法存在明显的内存限制。在我出现内存错误之前,我只能从文件中导入大约10个数据集。

Allow me to clarify my question further: In the first question (the link), I was using the easiest method I could think of to generate an image from a large collection of data stored in multiple files. This method was to import the data, generate a pcolormesh plot using matplotlib, and then save a high resolution image from this plot. But there are obvious memory limitations to this approach. I can only import about 10 data sets from the files before I reach a memory error.

在那个问题中,我问是否有更好的方法来修补数据集(保存在HDF5文件中)到单个图像中,而不将所有数据导入计算机内存。 (我可能需要将这些数据集中的100个拼凑成一个图像。)此外,我需要在Python中完成所有操作以使其自动化(因为此脚本需要经常针对不同的数据集运行)。

In that question, I was asking if there is a better method to patch together the data sets (that are saved in HDF5 files) into a single image without importing all of the data into the memory of the computer. (I will likely require 100s of these data sets to be patched together into a single image.) Also, I need to do everything in Python to make it automated (as this script will need to be run very often for different data sets).

我在尝试使用各种库工作时发现的真正问题是:我如何使用Python中的高分辨率图像?例如,如果我有一个非常高分辨率的PNG图像,我怎么能用Python操作它(裁剪,分割,运行fft等)?根据我的经验,在尝试导入高分辨率图像时,我总是遇到记忆问题(想想显微镜或望远镜上的高分辨率图片(我的应用是显微镜))。有没有设计用于处理此类图像的库?

The real question I discovered while trying to get this to work using various libraries is: How can I work with high resolution images in Python? For example, if I have a very high resolution PNG image, how can I manipulate it with Python (crop, split, run through an fft, etc.)? In my experience, I have always run into memory issues when trying to import high resolution images (think ridiculously high resolution pictures from a microscope or telescope (my application is a microscope)). Are there any libraries designed to handle such images?

或者,相反,我如何从使用Python保存在文件中的大量数据生成高分辨率图像?同样,数据文件可能是任意大的(如果不是更大,则为5-6千兆字节)。

Or, conversely, how can I generate a high resolution image from a massive amount of data saved in a file with Python? Again the data file could be arbitrarily large (5-6 Gigabytes if not larger).

但在我的实际应用中,我的问题是:是否有库或某种类型技术可以让我获取从我的设备收到的所有数据集(保存在HDF5中)并将它们拼接在一起以生成所有数据集的图像?或者我可以将所有数据集保存在单个(非常大)的HDF5文件中。那我怎么能导入这个文件然后从它的数据创建一个图像?

But in my actual application, my question is: Is there a library or some kind of technique that would allow me to take all of the data sets that I receive from my device (which are saved in HDF5) and patch them together to generate an image from all of them? Or I could save all of the data sets in a single (very large) HDF5 file. Then how could I import this one file and then create an image from its data?

我不关心在某些交互式图中显示数据。情节的分辨率并不重要。我可以很容易地使用较低的分辨率,但我必须能够从数据中生成并保存高分辨率图像。

I do not care about displaying the data in some interactive plot. The resolution of the plot is not important. I can easily use a lower resolution for it, but I must be able to generate and save a high resolution image from the data.

希望这能澄清我的问题。请随意询问有关我的问题的任何其他问题。

Hope this clarifies my question. Feel free to ask any other questions about my question.

推荐答案

你说它显然不能存储在内存中,但是以下计算说不然。

You say it "obviously can't be stored in memory", but the following calculations say otherwise.

20,000 * 20,000 pixels * 4 channels = 1.6GB

最合理的现代电脑有8GB到16GB的内存,所以处理1.6GB应该不是问题。

Most reasonably modern computers have 8GB to 16GB of memory so handling 1.6GB shouldn't be a problem.

但是,为了处理您需要做的拼接,您可以将每个像素从一个文件流式传输到另一个文件。这假设格式是使用线性编码格式(如BMP或TIFF)的无损位图。只需读取每个文件并附加到结果文件中。

However, in order to handle the patchworking you need to do, you could stream each pixel from one file into the other. This assumes the format is a lossless bitmap using a linear encoding format like BMP or TIFF. Simply read each file and append to your result file.

如果文件大小不同或在某种类型的网格中拼凑在一起,您可能需要稍微聪明一些。在这种情况下,您需要计算结果图像的总尺寸并偏移文件写入指针。

You may need to get a bit clever if the files are different sizes or patched together in some type of grid. In that case, you'd need to calculate the total dimensions of the resulting image and offset the file writing pointer.

这篇关于用Python构建高分辨率图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 09:19