问题描述
我最近一直在实施一些基于PHP / IMAP的邮件处理功能,除了消息体解码(在某些情况下),大部分功能都很好,
RFC 2822 (Internet Message Format文档指南),阅读通过电子邮件处理代码的六打开源CMS,并阅读一个bajillion论坛帖子,博客文章等处理邮件处理PHP。
我也为PHP,分类并完全重写了一个类,并且类处理电子邮件很好 - 我有一些有用的方法可以检测自动回复(不在办公室,旧地址等),解码base64和8bit消息等。
但是,事情,我根本不能可靠地工作(或者有时候,总是)当一个消息进来时, Content-转移编码:7bit
。
似乎不同的电子邮件客户端/服务解释了 7BIT
表示不同的东西。我已经收到一些电子邮件,据说这些电子邮件实际上是 Base64编码的
7BIT
。我已经得到了一些实际引用可打印编码。还有一些没有以任何方式编码。有些是HTML,但不表示为HTML,它们也列为 7BIT
...
以下是使用7Bit编码接收的消息正文的一些示例(剪贴):
1:
随机消息= 20
从我的iPhone发送
2:
PGh0bWwgeG1sbnM6dj0idXJuOnNjaGVtYXMtbWljcm9zb2Z0LWNvbTp2bWwi
IHhtbG5zOm89InVybjpzY2hlbWFzLW1pY3Jvc29mdC1jb206b2ZmaWNlOm9m
3:
橘子胡椒= 0A = C2 = A0 = 0如果您在下个月左右有任何availabili =
ty,我知道。 = 0A = C2 = A0 = 0AThank你,= 0名字保留= 0A908 =
-319-5916 = 0A = C2 = A0 = 0A = C2 = A0 = 0A = C2 = A0 = 0A = 0A = 0A ______________________________ =
__ = 0AFrom:名称Witheld = 0ATo:名称保留=
这些都是 使用'7Bit'编码发送(至少根据PHP / imap _ *
),但是在我可以通过之前,他们显然需要更多的解码他们作为明文。有没有办法可靠地将所有的消息与所谓的7Bit编码转换成明文?
花了一点时间,决定只写一些启发式检测,正如Max在我原来的问题的评论中建议的那样。
我已经构建了一个更强大的 decode7Bit )
方法在,它通过一堆常用的编码字符(如 = A0
),并以其UTF-8等价物代替它们,然后也可以将消息解码为它们是base64编码的:
/ **
*解码7位文本。
*
* PHP似乎认为大多数电子邮件是7BIT编码的,因此这个
*解码方法假定传递的文本实际上可能是base64-
*编码,可打印的编码或纯文本。这种方法
*不是直接通过特定的解码函数传递
*,而是通过一些常见的编码方案来尝试解码所有的
*,并且最终得到类似的东西* 纯文本。
*
*结果不能保证,但它的效果相当不错。
*
* @param $ text(string)
*要转换的7位文本。
*
* @return(string)
*解码后的文字。
* /
public function decode7Bit($ text){
//如果第一行没有空格,假设正文是
//实际上是base64编码的,并进行解码。
$ lines = explode(\r\\\
,$ text);
$ first_line_words = explode('',$ lines [0]);
if($ first_line_words [0] == $ lines [0]){
$ text = base64_decode($ text);
}
//将常用编码字符手动转换为UTF-8等效字体。
$ characters = array(
'= 20'=>'',//空格
'= E2 = 80 = 99'=>',//单引号
'= 0A'=>\r\\\
,//换行符
'= A0'=>'',//不间断的空格
'= C2 = A0'=>''//非破坏空间
= \r\\\
=>'',// join line
'= E2 = 80 = A6'=>'...',//省略号
'= E2 = 80 = A2'=>'•',//子弹
);
//循环编码字符并替换找到的任何字符。
foreach($ characters as $ key => $ value){
$ text = str_replace($ key,$ value,$ text);
}
return $ text;
}
如果你有任何想法让这个更有效率,让我知道。我最初尝试通过 quoted_printable_decode()
运行所有内容,但有时候PHP会抛出一些模糊和无益的异常,所以我放弃了这种方法。
I've been implementing some PHP/IMAP-based email handling functionality lately, and have most everything working great, except for message body decoding (in some circumstances).
I think that, by now, I've half-memorized RFC 2822 (the 'Internet Message Format' document guidelines), read through email-handling code for half a dozen open source CMSes, and read a bajillion forum posts, blog posts, etc. dealing with handling email in PHP.
I've also forked and completely rewritten a class for PHP, Imap, and the class handles email respectably well—I have some helpful methods in there to detect autoresponders (for out of office, old addresses, etc.), decode base64 and 8bit messages, etc.
However, the one thing I simply can't get to work reliably (or, sometimes, at all) is when a message comes in with Content-Transfer-Encoding: 7bit
.
It seems that different email clients/services interpret 7BIT
to mean different things. I've gotten some emails that are supposedly 7BIT
that are actually Base64-encoded. I've gotten some that are actually quoted-printable-encoded. And some that are not encoded in any way whatsoever. And some that are HTML, but aren't indicated as being HTML, and they're also listed as 7BIT
...
Here are a few examples (snips) of message bodies received with 7Bit encodings:
1:
A random message=20
Sent from my iPhone
2:
PGh0bWwgeG1sbnM6dj0idXJuOnNjaGVtYXMtbWljcm9zb2Z0LWNvbTp2bWwi
IHhtbG5zOm89InVybjpzY2hlbWFzLW1pY3Jvc29mdC1jb206b2ZmaWNlOm9m
3:
tangerine apricot pepper.=0A=C2=A0=0ALet me know if you have any availabili=
ty over the next month or so. =0A=C2=A0=0AThank you,=0ANames Withheld=0A908=
-319-5916=0A=C2=A0=0A=C2=A0=0A=C2=A0=0A=0A=0A______________________________=
__=0AFrom: Names Witheld =0ATo: Names Withheld=
These are all sent with '7Bit' encodings (well, at least according to PHP/imap_*
), but they're obviously in need of more decoding before I can pass them along as plaintext. Is there any way to reliably convert all messages with supposedly-7Bit encodings to plaintext?
After spending a bit more time, I decided to just write up some heuristic detection, as Max suggested in the comments on my original question.
I've built a more robust decode7Bit()
method in Imap.php, which goes through a bunch of common encoded characters (like =A0
) and replaces them with their UTF-8 equivalents, and then also decodes messages if they look like they are base64-encoded:
/**
* Decodes 7-Bit text.
*
* PHP seems to think that most emails are 7BIT-encoded, therefore this
* decoding method assumes that text passed through may actually be base64-
* encoded, quoted-printable encoded, or just plain text. Instead of passing
* the email directly through a particular decoding function, this method
* runs through a bunch of common encoding schemes to try to decode everything
* and simply end up with something *resembling* plain text.
*
* Results are not guaranteed, but it's pretty good at what it does.
*
* @param $text (string)
* 7-Bit text to convert.
*
* @return (string)
* Decoded text.
*/
public function decode7Bit($text) {
// If there are no spaces on the first line, assume that the body is
// actually base64-encoded, and decode it.
$lines = explode("\r\n", $text);
$first_line_words = explode(' ', $lines[0]);
if ($first_line_words[0] == $lines[0]) {
$text = base64_decode($text);
}
// Manually convert common encoded characters into their UTF-8 equivalents.
$characters = array(
'=20' => ' ', // space.
'=E2=80=99' => "'", // single quote.
'=0A' => "\r\n", // line break.
'=A0' => ' ', // non-breaking space.
'=C2=A0' => ' ', // non-breaking space.
"=\r\n" => '', // joined line.
'=E2=80=A6' => '…', // ellipsis.
'=E2=80=A2' => '•', // bullet.
);
// Loop through the encoded characters and replace any that are found.
foreach ($characters as $key => $value) {
$text = str_replace($key, $value, $text);
}
return $text;
}
This was taken from version 1.0-beta2 of the Imap class for PHP that I have on GitHub.
If you have any ideas for making this more efficient, let me know. I originally tried running everything through quoted_printable_decode()
, but sometimes PHP would throw exceptions that were vague and unhelpful, so I gave up on that approach.
这篇关于使用7BIT内容传输编码解析邮件正文 - PHP的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!