我最近了解到一个名为 CommonMark 的项目,它正确识别和处理原文中的歧义降价规范.http://commonmark.org/ 它有很棒的 C# 库支持.


The source that follows with the download is written in Perl, which I have no intentions of honoring. It is riddled with regular expressions, and it relies on MD5 hashes to escape certain characters. Something is just wrong about that!

I'm about to hard code a parser for Markdown. What is experience with this?

If you don't have anything meaningful to say about the actual parsing of Markdown, spare me the time. (This might sound harsh, but yes, I'm looking for insight, not a solution, that is, a third-party library).

To help a bit with the answers, regular expressions are meant to identify patterns! NOT to parse an entire grammar. That people consider doing so is foobar.

  • 如果您考虑 Markdown,它基本上是基于段落的概念.
  • 因此,一种合理的方法可能是将输入分成多个段落.
  • 段落有很多种,例如标题、文本、列表、块引用和代码.
  • 因此,挑战在于识别这些段落以及它们出现的上下文.


I'll be back with a solution, once I find it's worthy to be shared.


The only markdown implementation I know of, that uses an actual parser, is Jon MacFarleane’s peg-markdown. Its parser is based on a Parsing Expression Grammar parser generator called peg.

Mauricio Fernandez recently released his Simple Markup Markdown parser, which he wrote as part of his OcsiBlog Weblog Engine. Because the parser is written in OCaml, it is extremely simple and short (268 SLOC for the parser, 43 SLOC for the HTML emitter), yet blazingly fast (20% faster than discount (written in hand-optimized C) and sixhundred times faster than BlueCloth (Ruby)), despite the fact that it isn't even optimized for performance yet. Because it is only intended for internal use by Mauricio himself for his weblog, there are a few deviations from the official Markdown specification, but Mauricio has created a branch which reverts most of those changes.

07-29 18:35