本文介绍了iTextSharp XMLWorker解析真的很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我使用以下代码在我的WPF应用程序中使用iTextSharp XMLWorker解析HTML字符串:I am parsing HTML string using iTextSharp XMLWorker in my WPF application using the below code:var css = "";using (var htmlMS = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html))){ //Create a stream to read our CSS using (var cssMS = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(css))) { //Get an instance of the generic XMLWorker var xmlWorker = XMLWorkerHelper.GetInstance(); //Parse our HTML using everything setup above xmlWorker.ParseXHtml(writer, doc, htmlMS, cssMS, System.Text.Encoding.UTF8, fontProv); }}解析工作正常但速度很慢,它大约需要2秒来解析HTML。因此,对于50页的pdf,大约需要2分钟。我在我的HTML字符串中使用内联样式。这是自然行为还是可以优化?The parsing works fine but it is really slow, it takes around 2 seconds to parse the HTML. So for a 50 page pdf, it takes around 2 minutes. I am using inline styling to in my HTML string. Is this the natural behaviour or it can be optimized?推荐答案这个问题是错误的,因为它表明HTML解析正在减慢一切。这不是真的。甚至在解析第一个HTML片段之前就会出现瓶颈。The question is wrong in the sense that it suggests that the HTML parsing is slowing everything down. That's not true. The bottleneck occurs even before the first snippet of HTML is parsed.您正在使用最基本的一些代码行来从HTML创建PDF,如 ParseHtml 示例:You are using the most basic handful of lines of code to create your PDF from HTML as demonstrated in the ParseHtml example:public void createPdf(String file) throws IOException, DocumentException { // step 1 Document document = new Document(); // step 2 PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file)); // step 3 document.open(); // step 4 XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream(HTML)); // step 5 document.close();}此代码很简单,但它在内部执行大量操作,如这个问题的评论: XMLWorkerHelper性能缓慢。This code is simple, but it performs a lot of operations internally as explained in the comments of this other question: XMLWorkerHelper performance slow.注册字体目录的行为会占用大量时间。您可以使用自己的 FontProvider 来避免这种情况,如 ParseHtmlFonts 示例。The act of registering font directories consumes plenty of time. You can avoid this, by using your own FontProvider as is done in the ParseHtmlFonts example.public void createPdf(String file) throws IOException, DocumentException { // step 1 Document document = new Document(); // step 2 PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file)); writer.setInitialLeading(12.5f); // step 3 document.open(); // step 4 // CSS CSSResolver cssResolver = new StyleAttrCSSResolver(); CssFile cssFile = XMLWorkerHelper.getCSS(new FileInputStream(CSS)); cssResolver.addCss(cssFile); // HTML XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS); fontProvider.register("resources/fonts/Cardo-Regular.ttf"); fontProvider.register("resources/fonts/Cardo-Bold.ttf"); fontProvider.register("resources/fonts/Cardo-Italic.ttf"); fontProvider.addFontSubstitute("lowagie", "cardo"); CssAppliers cssAppliers = new CssAppliersImpl(fontProvider); HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers); htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory()); // Pipelines PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer); HtmlPipeline html = new HtmlPipeline(htmlContext, pdf); CssResolverPipeline css = new CssResolverPipeline(cssResolver, html); // XML Worker XMLWorker worker = new XMLWorker(css, true); XMLParser p = new XMLParser(worker); p.parse(new FileInputStream(HTML)); // step 5 document.close();}在这种情况下,我们会指导iText DONTLOOKFORFONTS,从而节省大量时间。我们告诉iText我们将在HTML中使用哪些字体,而不是让iText查找字体。In this case, we instruct iText DONTLOOKFORFONTS, thus saving an enormous amount of time. Instead of having iText looking for fonts, we tell iText which fonts we're going to use in the HTML. 这篇关于iTextSharp XMLWorker解析真的很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云! 08-13 21:30