问题描述
我正在使用F#和FParsec开发多部分的MIME解析器.我正在迭代开发,因此这是非常未经改进的易碎代码,它只能解决我的第一个直接问题.红色,绿色,重构.
I'm developing a multi-part MIME parser using F# and FParsec. I'm developing iteratively, and so this is highly unrefined, brittle code--it only solves my first immediate problem. Red, Green, Refactor.
我需要解析一个流而不是一个字符串,这确实使我陷入了循环.在这种限制下,据我所知,我需要递归调用解析器.如何做到这一点超出了我的理解范围,至少是到目前为止我的工作方式.
I'm required to parse a stream rather than a string, which is really throwing me for a loop. Given that constraint, to the best of my understanding, I need to call a parser recursively. How to do that is beyond my ken, at least with the way I've proceeded thus far.
namespace MultipartMIMEParser
open FParsec
open System.IO
type private Post = { contentType : string
; boundary : string
; subtype : string
; content : string }
type MParser (s:Stream) =
let ($) f x = f x
let ascii = System.Text.Encoding.ASCII
let str cs = System.String.Concat (cs:char list)
let q = "\""
let qP = pstring q
let pSemicolon = pstring ";"
let manyNoDoubleQuote = many $ noneOf q
let enquoted = between qP qP manyNoDoubleQuote |>> str
let skip = skipStringCI
let pContentType = skip "content-type: "
>>. manyTill anyChar (attempt $ preturn () .>> pSemicolon)
|>> str
let pBoundary = skip " boundary=" >>. enquoted
let pSubtype = opt $ pSemicolon >>. skip " type=" >>. enquoted
let pContent = many anyChar |>> str // TODO: The content parser needs to recurse on the stream.
let pStream = pipe4 pContentType pBoundary pSubtype pContent
$ fun c b t s -> { contentType=c; boundary=b; subtype=t; content=s }
let result s = match runParserOnStream pStream () "" s ascii with
| Success (r,_,_) -> r
| Failure (e,_,_) -> failwith (sprintf "%A" e)
let r = result s
member p.ContentType = r.contentType
member p.Boundary = r.boundary
member p.ContentSubtype = r.subtype
member p.Content = r.content
示例POST的第一行如下:
The first line of the example POST follows:
content-type: Multipart/related; boundary="RN-Http-Body-Boundary"; type="multipart/related"
它跨越文件中的一行.内容中的其他子部分包括跨越多行的content-type
值,因此我知道如果要重用它们,则必须优化解析器.
It spans a single line in the file. Further sub-parts in the content include content-type
values that span multiple lines, so I know I'll have to refine my parsers if I am to reuse them.
我必须以pBoundary
的(string?)结果调用pContent
,以便可以在适当的边界上分割流的其余部分,然后以某种方式返回内容的多个部分帖子,每个帖子都是一个单独的帖子,包含标题和内容(显然必须是字符串以外的内容).我的头在旋转.这段代码似乎太复杂了,无法解析一行.
Somehow I've got to call pContent
with the (string?) results of pBoundary
so that I can split the rest of the stream on the appropriate boundaries, and then somehow return multiple parts for the content of the post, each of which will be a separate post, with headers and content (which will obviously have to be something other than a string). My head is spinning. This code already seems far too complex to parse a single line.
非常感谢您的洞察力和智慧!
Much appreciation for insight and wisdom!
推荐答案
这是一个片段,可能使您朝正确的方向前进.
This is a fragment that might get you going in the right direction.
让您的解析器吐出具有相同基本类型的内容.为此,我更喜欢使用F#的有区别的联合.如果确实需要将值推入Post类型,则遍历返回的AST树.那就是我要采取的方式.
Get your parsers to spit out something with the same base type. I prefer to use F#'s discriminated unions for this purpose. If you really do need to push values into a Post type, then walk the returned AST tree. That's just the way I'd approach it.
#if INTERACTIVE
#r"""..\..\FParsecCS.dll""" // ... edit path as appropriate to bin/debug, etc.
#r"""..\..\FParsec.dll"""
#endif
let packet = @"content-type: Multipart/related; boundary=""RN-Http-Body-Boundary""; type=""multipart/related""
--RN-Http-Body-Boundary
Message-ID: <25845033.1160080657073.JavaMail.webmethods@exshaw>
Mime-Version: 1.0
Content-Type: multipart/related; type=""application/xml"";
boundary=""----=_Part_235_11184805.1160080657052""
------=_Part_235_11184805.1160080657052
Content-Type: Application/XML
Content-Transfer-Encoding: binary
Content-Location: RN-Preamble
Content-ID: <1430586.1160080657050.JavaMail.webmethods@exshaw>"
//XML document begins here...
type AST =
| Document of AST list
| Header of AST list
/// ie. Content-Type is the tag, and it consists of a list of key value pairs
| Tag of string * AST list
| KeyValue of string * string
| Body of string
上面的AST DU可以代表您在其他问题中发布的示例数据的第一遍. 可以比这更细,但通常越简单越好.我的意思是,示例中的最终目标是Post类型,您可以通过一些简单的模式匹配来实现.
The AST DU above could represent a first pass of the example data you posted in your other question. It could be finer grained than that, but simpler is normally better. I mean, the ultimate destination in your example is a Post type, and you could achieve that with some simple pattern matching.
这篇关于F#,FParsec和递归调用流解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!