在C＃中的文档格式之间的转换

本文介绍了在C＃中的文档格式之间的转换的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

什么是C＃HTML，XML和XSL-FO之间转换的最佳方式是什么？

What is the best way to convert between HTML, XML, and XSL-FO in C#?

我已经有HTML（从FCKEditor的管道中）和我倒是想打印PDF（我有一个XSL-> PDF转换器）。我似乎无法找到将HTML转换成任何XSL友好库。

I already have the HTML (piped in from FCKEditor) and I'd like to print a PDF (I have an XSL->PDF converter). I just can't seem to find a library that will convert from HTML into anything XSL friendly.

推荐答案

一个一年或两年回来，我不得不产生从C ++ / C＃程序的PDF文件。最后，我定居在启动Apache的Java的中作为单独的进程来执行转换。与XSL-FO的经历并不愉快。当时，有没有出现是已经完全实现XSL-FO一个单一的工具。工具倾向于挑选规范的一个子集，谬以千里那个。鉴于XSL-FO的庞大复杂，我开始怀疑是否有将永远是一个完整的实现。

A year or two back, I had to generate pdfs from a C++/C# program. In the end I settled on launching Apache's Java FOP as a separate process to do the conversion. The experience with xsl-fo was not a pleasant one. At the time, there didn't appear to be a single tool that had implemented xsl-fo completely. Tools tended to pick a subset of the specification and hack away at that. Given the sprawling complexity of xsl-fo, I'm starting to wonder if there will ever be a full implementation.

FOP往往是越野车和大量的时间花在工作周围的问题。 XSLT和XPath的是难学。它花了几个星期，我看到过去冗长之前，可能很快得到完成的事情。我不认为我曾经完全得到我的身边XSL-FO头虽然。它使HTML和CSS模型看起来像一个孩子的玩具。幸运的是，PDF文件生成，并且没有太多的问题。： - ）

FOP tended to be buggy and considerable time was spent working around issues. XSLT and XPaths were difficult to learn. It took a few weeks before I was seeing past the verbosity and could quickly get things done. I don't think I ever quite got my head around xsl-fo though. It makes the html and css model look like a child's toy. Luckily, the pdfs generate, and don't have too many problems. :-)

反正手头的任务：产生从FCKEditor的输出XHTML PDF文件

Anyway, the task at hand: generating pdfs from xhtml output from FCKEditor.

我似乎无法找到将HTML转换成任何XSL友好库。

嘿。是啊，这是'因为没有一个，而且可能不会是一个HTML到XSL-FO转换器，它的任何好处。这种转换器有反对的几件事情：浏览器的复杂性和XSL-FO的复杂性。对于这样的转换器来处理的平均html文件，它需要一个网络浏览器的胆量：布局，CSS支持甚至可能的JavaScript。那么它必须采取呈现的页面，并找出需要什么XSL-FO得到的东西看起来相似，XSL-FO的分页限制范围内符合。

Heh. Yeah, that's 'cos there isn't one, and probably won't be an html to xsl-fo converter that's any good. Such a converter has a few things against it: complexity of browsers and complexity of xsl-fo. For such a converter to deal with an average html document, it needs the guts of a web browser: the layout, css support probably even JavaScript. Then it has to take the rendered page, and figure out what xsl-fo is needed to get something which looks similar, and fits within the paged constraints of xsl-fo.

这就像制作的Word Viewer问题：没有重新实现了很多词，因为它看起来不一样它吸收的大部分时间

It's like the problem with making a word viewer: without reimplementing a lot of word, it sucks most of the time because it doesn't look the same.

所以。 .. 你能做什么？那么，有HTML一起工作的一小部分是一个良好的开端。希望从FCKEditor的输出是XHTML，因为越来越HTML转换成XML是痛苦的世界中本身（能对有用）。接下来，除非一些可怜的灵魂已经作出了FCKEditor的XHTML - 为您的XSL-FO实现> XSL-FO的XSLT，你必须做的。这涉及到学习XSL-FO，XSLT和XPath。在我的经验，这将需要几个星期，将是一个拼凑起来的解决方案。

So... what can you do? Well, having a small subset of html to work with is a good start. Hopefully the output from FCKEditor is xhtml, as getting html into xml is a world of pain in itself (which tidy can be useful for). Next, unless some poor soul has already made an FCKEditor xhtml -> xsl-fo xslt for your xsl-fo implementation, you'll have to make one. That involves learning xsl-fo, xslt and xpath. In my experience it'll take a few weeks and will be a cobbled together solution.

要开始使用XSL-FO我发现下面的链接有用：

To get started with xsl-fo I found the following links useful:

概述了问题XSL-FO试图解决

对于三个快速片头看到的，的和的

XSL-FOTutorial
XSL Standard
Apache FOP Compliance Page
XSL-FO: Ready for Prime Time? outlines the problem xsl-fo tries to solve
For three quick intros see a, b and c

那么，这一切XSL-FO，XSLT的东西，所有其他的事情？该奠定了它作为：

So what's all this xsl-fo, xslt stuff and all the other things? The XSL-FO: Ready for Prime Time? lays it out as:

可扩展样式表语言家族（XSL）XSL是用于定义XML文档转换和呈现建议一个家庭。它由三部分组成：

XSL转换（XSLT），用于转换XML的语言

该XML路径语言（XPath），使用XSLT来访问或引用XML文档的部分表达语言。（XPath是也使用XML链接规范）

XSL格式化对象（XSL-FO），用于指定格式化语义

我的建议？跑。另找了。寻找另一种解决方案。产生LaTeX文件，并将它们转换为PDF文件。生成别的东西。使Word文档，并使用打印出来。生成图像。控制Firefox的打印页面为PDF文件。找到除了爱情之外，以避免需要PDF文件在所有。。任何事情，只要不打架HTML，XSL-FO，FOP，XSLT和XPath

My advice? Run. Find another away. Find another solution. Generate LaTeX files, and convert them into pdfs. Generate something else. Make word documents and print them using PDFCreator. Generate images. Control Firefox to print pages as pdfs. Find away to avoid needing pdfs at all. Anything, as long as it isn't fighting html, xsl-fo, FOP, xslt and xpath.

PS：让我知道如果你需要任何帮助。： - ）

PS: Let me know if you need any help. :-)

这篇关于在C＃中的文档格式之间的转换的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！