c# - 通过坐标提取PDF文本

我想知道Microsoft .NET中是否有一些PDF库能够通过给出坐标来提取文本。

例如(用伪代码):

PdfReader reader = new PdfReader();
reader.Load("file.pdf");

// Top, bottom, left, right in pixels or any other unit
string wholeText = reader.GetText(100, 150, 20, 50);

我尝试使用PDFBox for .NET(该工具在IKVM之上工作)来做到这一点，但没有运气，而且它似乎已经过时且未记录在案。

也许任何人都可以使用PDFBox，iTextSharp或任何其他开源库来做这件事，他/她可以给我一个提示。

先感谢您。

最佳答案

好吧，谢谢大家的努力。

我在IKVM编译的基础上使用Apache的PDFBox来获得它，这是最终的代码:

PDDocument doc = PDDocument.load(@"c:\invoice.pdf");

PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.addRegion("testRegion", new java.awt.Rectangle(0, 10, 100, 100));
stripper.extractRegions((PDPage)doc.getDocumentCatalog().getAllPages().get(0));

string text = stripper.getTextForRegion("testRegion");

它就像一种魅力。

无论如何，谢谢你，我希望我自己的回答能对其他人有所帮助。如果您需要更多详细信息，请在此处注释掉，我将更新此答案。