本文介绍了如何在C#中从pdf文件中读取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
public string GetPDFText(String pdfPath)
{
PdfReader reader = new PdfReader(pdfPath);
StringWriter output = new StringWriter();
String _text = String.Empty;
int _subpage = 0;
Int16 PerPageText = 2000;//char
Int32 PageNumber = 1;
for (int i = 1; i <= reader.NumberOfPages; i++)
{
_text = _text+PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy());
_subpage = (_text.Length - _text.Length % PerPageText) / PerPageText;
if (_subpage > 0)
{
for (int j = 0; j < _subpage; j++)
{
output.WriteLine("Page " + PageNumber.ToString() + "<br />" + _text.Substring(PerPageText * j, PerPageText) + "<br /><br />");
PageNumber = PageNumber+1;
}
_text = _text.Substring(_text.Length - _text.Length % PerPageText, _text.Length % PerPageText);
}
//else {
// output.WriteLine("Page " + i.ToString() + "<br />" + _text+ "<br /><br />");
//}
}
return output.ToString();
}
推荐答案
这篇关于如何在C#中从pdf文件中读取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!