问题描述
我正在使用C#创建一个简单的iCalendar,发现按照RFC 2445第4.1节的内容折叠非常令人头疼(对我来说:-).
I am creating a simple iCalendar using C# and found that the Content Folding per Section 4.1 of RFC 2445 to be quite a headache (for me :-).
http://www.apps.ietf.org/rfc/rfc2445.html#sec-4.1
对于长行,您要转义一些字符(我相信反斜杠,分号,逗号和换行符),然后将其折叠以使任何行都不能超过75个八位位组.我在网上找到了几种直接的方法.最简单的方法是用转义版本替换有问题的字符,然后在每第75个字符处插入CRLF.像这样:
For long lines, you are to escape some characters (backslash, semi-colon, comma and newline, I believe) and then fold it so that no line is longer than 75 octets. I found several straight forward way of doing this on the web. The simplest is to replace the characters in question with escaped version and then insert CRLF at every 75th character. Something like:
// too simple, could break at an escape sequence boundary or multi-byte character may overflow 75 octets
txt = txt.Replace(@"\", "\\\\").Replace(";", "\\;").Replace(",", "\\,").Replace("\r\n", "\\n");
var regex = new System.Text.RegularExpressions.Regex( ".{75}");
var escape_and_folded = regex.Replace( txt, "$0\r\n ");
我看到两个问题. CRLF可能已插入转义序列中.例如,如果发生插入使得转义的新行序列"\ n"变为"\ CRLF"(那么"n"将在下一行).第二个问题是当有多字节字符时.由于是按字符计算,因此该行可能会超过75个八位位组.
I see two issues. It’s possible that the CRLF is inserted into an escaped sequence. For example, if insertion occurs such that an escaped new line sequence "\n" becomes "\CRLF" (then the "n" will be on the next line). The second issue is when there are multi-byte characters. Since calculation is per characters it’s possible that the line may become longer than 75 octets.
一个简单的解决方案是逐个字符地移动字符串并进行转义和折叠,但这似乎是蛮力的.有人有更优雅的解决方案吗?
A simple solution is to walk the string character by character and escape and fold but this seems rather brute force. Does anybody have a more elegant solution?
推荐答案
首先,请确保您查看而是使用RFC5545 . RFC2445已过时.您可以在这里找到我的PHP实现:
First off, make sure you look at RFC5545 instead. RFC2445 is obsolete.You can find my PHP implementation here:
https://github.com/fruux/sabre-vobject/blob/master/lib/Property.php#L252
在php中,我们具有mb_strcut函数.我不确定是否有.NET等效项,但这至少会使事情变得简单得多.到目前为止,将转义序列(\
)对半折叠没有问题.一个好的解析器将首先展开行,然后再处理转义.特别是由于必须转义哪些字符,取决于实际属性. (有时,
或;
会被转义,有时则不会).
In php we have the mb_strcut function. I'm not sure if there's a .NET equivalent, but that would at the very least make things a lot simpler. I've had no issues so far with folding escape sequences (\
) in half. A good parser will first unfold the lines, and only then deal with unescaping. Especially since which characters must be escaped, depends on the actual property. (sometimes ,
or ;
gets escaped, sometimes they don't).
这篇关于iCalendar RFC 2445第4.1节内容折叠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!