I am working on an online portal, where researchers can upload their research papers. One requirement is, that all PDFs are stored in PDF/A-format. As I can't rely on the users to generate PDF/A conforming documents, I need a tool to check and convert standard PDFs into PDF/A format.
What is the best tool you know of?
- Price
- Quality
- Speed
- Available APIs
Open-source tools would be prefered, but a search revealed none. iText can create PDF/a, but converting isn't easy to do, as you have to read every page and copy it to a new document, losing all bookmarks and annotations in this process. (At least as far as I know, if you know of an easy solution, let me know).
APIs should be available for either PHP, Java or a command-line-tool should be provided. Please do not list either GUI-only or Online-only solutions.
I am not sure all your goals can be satisfied at the same time. The story around PDF/A is a lot more complex than format conversions like tiff to png.
- The base format is PDF 1.4: what to do with higher versioned documents which use features from those higher versions? Information might be lost.
- In both PDF/A-1a and 1b, metadata in XMP/RDF format is mandatory. If the original document is without metadata, you'll have to get it from somewhere and add it. At least iText can do that.
- There are lots of little details to get right, from embedding fonts to making sure spaces are present instead of only horizontal movement commands.
To sum it all up: I think you are better off placing some or all of the responsibility for compliance with the producers of the PDFs. Of course, that doesn't mean you can't help them: If you figure out which tools the majority use to create their papers, you can point to documentation about PDF/A and the specific tools. (as a bit of an extreme example of such documentation, have a look at this)
