本文介绍了从.PDF文件提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要从.PDF文件中提取数据,并将其加载到SQL 2008
任何一个可以告诉我如何着手?
I need to extract data from .PDF files and load it in to SQL 2008.Can any one tell me how to proceed??
推荐答案
下面是如何使用iTextSharp的从PDF提取文本数据的示例。你必须与它的一些捣鼓让它做你想要什么,我认为这是一个很好的轮廓。你可以看到StringBuilder中被用于存储文本,但你可以很容易地改变使用SQL。
Here is an example of how to use iTextSharp to extract text data from a PDF. You'll have to fiddle with it some to make it do exactly what you want, I think it's a good outline. You can see how the StringBuilder is being used to store the text, but you could easily change that to use SQL.
static void Main(string[] args)
{
PdfReader reader = new PdfReader(@"c:\test.pdf");
StringBuilder builder = new StringBuilder();
for (int x = 1; x <= reader.NumberOfPages; x++)
{
PdfDictionary page = reader.GetPageN(x);
IRenderListener listener = new SBTextRenderer(builder);
PdfContentStreamProcessor processor = new PdfContentStreamProcessor(listener);
PdfDictionary pageDic = reader.GetPageN(x);
PdfDictionary resourcesDic = pageDic.GetAsDict(PdfName.RESOURCES);
processor.ProcessContent(ContentByteUtils.GetContentBytesForPage(reader, x), resourcesDic);
}
}
public class SBTextRenderer : IRenderListener
{
private StringBuilder _builder;
public SBTextRenderer(StringBuilder builder)
{
_builder = builder;
}
#region IRenderListener Members
public void BeginTextBlock()
{
}
public void EndTextBlock()
{
}
public void RenderImage(ImageRenderInfo renderInfo)
{
}
public void RenderText(TextRenderInfo renderInfo)
{
_builder.Append(renderInfo.GetText());
}
#endregion
}
这篇关于从.PDF文件提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!