问题描述
我正在尝试从PDF文件提取文本: http://www.filedropper.com/copy_1 ,但是我从页面中得到的文字不到一半.我正在使用iTextSharp:
I'm trying to extract text from the PDF file: http://www.filedropper.com/copy_1, but I get less than half of text from a page.I'm using iTextSharp:
PdfReader reader = new PdfReader(file);
string currentText = PdfTextExtractor.GetTextFromPage(reader, 1);
我也使用了SimpleTextExtractionStrategy来代替默认的LocationTextExtractionStrategy:
I have used SimpleTextExtractionStrategy as well instead of default LocationTextExtractionStrategy:
PdfTextExtractor.GetTextFromPage(reader, 1, new SimpleTextExtractionStrategy())
该文件最初是从Microsoft Reporting Service(我无权访问)生成的,并且我已经提取了一页用于测试文本提取.
The file was originally generated from Microsoft Reporting Service (to which I don't have an access), and that I've extracted one page for testing the text extraction.
任何人都可以帮忙吗?
推荐答案
尝试一下:-
PdfReader reader = new PdfReader(file);
StringBuilder currentText= new StringBuilder();
for (int i= 1; i <= reader.NumberOfPages; i++)
{
currentText.Append(PdfTextExtractor.GetTextFromPage(reader, i));
}
,然后对"currentText"执行所需的任何操作.
and then perform whatever operation you want on "currentText".
这篇关于PDF提取未完成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!