问题描述
我想做的事情非常简单:给定一个包含论文/书籍的PDF/PS/DjVu文件,找到论文的作者和标题(其他任何元数据都很好,但需要的更少).这种识别不一定是完美的,但是我想尽我所能.我正在寻找允许访问这些文件的元数据和内容的开源.NET和/或Java库(最好是.NET).
What I want to do is pretty simple: given a PDF/PS/DjVu file containing a paper/book, find the authors and title of the paper (any other metadata would be good, but less needed). This recognition doesn't have to be perfect, but I'd like to make it as good as I can. I am looking for open-source .NET and/or Java libraries (preferably .NET) which allow to access metadata and contents of these files.
对于PDF,我发现 PDFBox (.NET/Java)和 PDF库(.NET),但我可能不知道还有更好的选择.对于Postscript和DjVu,我什么都没找到.
For PDF I've found PDFBox (.NET/Java) and PDF Library (.NET), but there may be better alternatives I am not aware of; for Postscript and DjVu, I haven't found anything.
推荐答案
对于DjVu,您可以使用 CamiNova 或开放源代码库DjVu Libre.
For DjVu, you can use the commerical SDK from CamiNova or the open source library, DjVu Libre.
这篇关于用于解析PDF,PostScript和/或DjVu的库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!