问题描述
我有一个特定的矩形区域内提取一个pdf文档的文本。的工作流程如下。首先PDF格式转换为JPG图片。然后用户借鉴,画面顶部选择矩形。然后,我莫名其妙地需要提取pdf文档的选择区域中的所有文本。从C#访问什么免费的PDF库有什么建议使用?
I have to extract text from a pdf doc within a specific rectangular region. The work-flow is as following. First of all pdf is converted to an jpg image. Then user draws selection rectangle on top of the picture. Then I somehow need to extract all text from pdf doc within that selection region. Any suggestions what freeware pdf libs accessible from C# to use?
推荐答案
我同意,OCR是不是在这里使用的方法。你需要一个PDF库,可以提取与边框坐标沿着文本。
I agree, OCR is not the approach to use here. You need a PDF library that can extract the text along with the bounding box coordinates.
QuickPDF是商业库(www.quickpdf.com),可以提取所需信息为$ 249非常合理的价格。 是您正在寻找的功能。这将提取整个页面的文本,那么你就需要使用简单的点和/或矩形的功能限制文字到您选择的矩形。
QuickPDF is a commercial library (www.quickpdf.com) that can extract the required information for a very reasonable price of $249. http://www.quickpdflibrary.com/help/quickpdf/DAExtractPageText.php is the function you are looking for. This will extract the text for the whole page and then you would need to use simple Point and/or Rectangle functions to limit the text to your selected rectangle.
我不不敢相信的iText具有基于我的研究这种能力。
I don't believe iText has this capability based on my research.
您也应该阅读的
这篇关于如何对特定的矩形区域内提取一个pdf文档的文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!