从 pdf 中提取页面作为 jpeg

本文介绍了从 pdf 中提取页面作为 jpeg的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在python代码中，如何高效地将pdf中的某个页面保存为jpeg文件?(用例:我有一个 python Flask Web 服务器，其中 pdf-s 将被上传，每个页面对应的 jpeg-s 是存储.)

In python code, how to efficiently save a certain page in a pdf as a jpeg file? (Use case: I've a python flask web server where pdf-s will be uploaded and jpeg-s corresponding to each page is stores.)

这个解决方案很接近，但问题是它没有将整个页面转换为jpeg.

This solution is close, but the problem is that it does not convert the entire page to jpeg.

推荐答案

可以使用pdf2image库.

The pdf2image library can be used.

你可以简单地安装它，

pip install pdf2image

安装后，您可以使用以下代码获取图像.

Once installed you can use following code to get images.

from pdf2image import convert_from_path
pages = convert_from_path('pdf_file', 500)

以 jpeg 格式保存页面

Saving pages in jpeg format

for page in pages:
    page.save('out.jpg', 'JPEG')

Github 存储库 pdf2image 还提到它使用 pdftoppm 和它需要其他安装:

the Github repo pdf2image also mentions that it uses pdftoppm and that it requires other installations:

pdftoppm 是一款具有实际魔力的软件.它作为名为 poppler 的更大软件包的一部分分发.Windows 用户必须安装 poppler for Windows.Mac 用户必须安装 Mac 版 poppler.Linux 用户将在发行版中预安装 pdftoppm(在 Ubuntu 和 Archlinux 上测试)，如果不是，请运行 sudo apt install poppler-utils.

您可以通过以下方式使用 anaconda 在 Windows 下安装最新版本:

You can install the latest version under Windows using anaconda by doing:

conda install -c conda-forge poppler

注意:http://blog.alivate.com 上提供了高达 0.67 的 Windows 版本.au/poppler-windows/ 但请注意，0.68 是于 2018 年 8 月发布，所以您将不会获得最新的功能或错误修复.

note: Windows versions upto 0.67 are available at http://blog.alivate.com.au/poppler-windows/ but note that 0.68 was released in Aug 2018 so you'll not be getting the latest features or bug fixes.

这篇关于从 pdf 中提取页面作为 jpeg的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！