问题描述
我正在建立一个在线门户网站,研究人员可以上传他们的研究论文。一个要求是,所有PDF都以PDF / A格式存储。由于我不能依赖用户生成符合PDF / A的文档,我需要一个工具来检查标准PDF并将其转换为PDF / A格式。
I am working on an online portal, where researchers can upload their research papers. One requirement is, that all PDFs are stored in PDF/A-format. As I can't rely on the users to generate PDF/A conforming documents, I need a tool to check and convert standard PDFs into PDF/A format.
什么是你知道的最好的工具?
What is the best tool you know of?
- 价格
- 质量
- 速度
- 可用API
- Price
- Quality
- Speed
- Available APIs
首选开源工具,但是搜索显示没有。 iText可以创建PDF / a,但转换并不容易,因为您必须阅读每个页面并将其复制到新文档,在此过程中丢失所有书签和注释。 (至少据我所知,如果你知道一个简单的解决方案,请告诉我。)
Open-source tools would be prefered, but a search revealed none. iText can create PDF/a, but converting isn't easy to do, as you have to read every page and copy it to a new document, losing all bookmarks and annotations in this process. (At least as far as I know, if you know of an easy solution, let me know).
API应该可用于PHP,Java或命令 - 应该提供线工具。请不要列出仅限GUI或仅限在线的解决方案。
APIs should be available for either PHP, Java or a command-line-tool should be provided. Please do not list either GUI-only or Online-only solutions.
推荐答案
我不确定您的所有目标是否都可以满足同一时间。围绕PDF / A的故事要比格式转换复杂得多,比如tiff到png。
I am not sure all your goals can be satisfied at the same time. The story around PDF/A is a lot more complex than format conversions like tiff to png.
- 基本格式是PDF 1.4:什么到使用更高版本的文档,使用那些更高版本的功能?信息可能会丢失。
- 在PDF / A-1a和1b中,XMP / RDF格式的元数据是强制性的。如果原始文档没有元数据,则必须从某处获取并添加它。至少iText可以做到这一点。
- 有很多小细节可以解决,从嵌入字体到确保存在空格而不是只有水平移动命令。
- The base format is PDF 1.4: what to do with higher versioned documents which use features from those higher versions? Information might be lost.
- In both PDF/A-1a and 1b, metadata in XMP/RDF format is mandatory. If the original document is without metadata, you'll have to get it from somewhere and add it. At least iText can do that.
- There are lots of little details to get right, from embedding fonts to making sure spaces are present instead of only horizontal movement commands.
总结一下:我认为你最好放弃部分或全部责任以遵守PDF的制作人。当然,这并不意味着你无法帮助他们:如果你弄清楚大多数人用来创建论文的工具,你可以指向有关PDF / A和特定工具的文档。 (作为此类文档的一个极端示例,请查看)
To sum it all up: I think you are better off placing some or all of the responsibility for compliance with the producers of the PDFs. Of course, that doesn't mean you can't help them: If you figure out which tools the majority use to create their papers, you can point to documentation about PDF/A and the specific tools. (as a bit of an extreme example of such documentation, have a look at this)
祝你好运。
这篇关于转换为PDF / A并检查Linux下的合规性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!