问题描述
我的代码检查一个邮箱,并将每封邮件转发给另一个用户.
但是我发现,根据邮件客户端,相同内容的解码方式有所不同(我的意思是,通过account @ gmail.com,account @ naver.com等发送时).
My code checks a mailbox, and forwards every mail to another user.
But I found out that the same contents are decoded differently according to mail clients(I mean, when sent with [email protected], with [email protected], and etc).
例如:我键入的内容,
主题:subject
内容:这是内容
For example:what I typed,
subject: subject
content: this is content
对于邮件客户端1:
358 2020-04-22 18:12:23,249:运行:调试:主题来自:=?utf-8?B?c3ViamVjdA ==?=
359 2020-04-22 18:12:23,249:运行:调试:内容来自:dGhpcyBpcyBjb250ZW50Cg ==
for mail client 1:
358 2020-04-22 18:12:23,249: run: DEBUG: subject has come as: =?utf-8?B?c3ViamVjdA==?=
359 2020-04-22 18:12:23,249: run: DEBUG: content has come as: dGhpcyBpcyBjb250ZW50Cg==
对于邮件客户端2:
178 2020-04-22 18:12:09,636:运行:调试:主题来自:=?utf-8?B?c3ViamVjdA ==?=
179 2020-04-22 18:12:09,636:运行:调试:内容来自:dGhpcyBpcyBjb250ZW50Cg ==
for mail client 2:
178 2020-04-22 18:12:09,636: run: DEBUG: subject has come as: =?utf-8?B?c3ViamVjdA==?=
179 2020-04-22 18:12:09,636: run: DEBUG: content has come as: dGhpcyBpcyBjb250ZW50Cg==
对于邮件客户端3:
300 2020-04-22 18:12:16,494:运行:调试:主题来自:主题
301 2020-04-22 18:12:16,494:运行:调试:内容来自:这是内容
for mail client 3:
300 2020-04-22 18:12:16,494: run: DEBUG: subject has come as: subject
301 2020-04-22 18:12:16,494: run: DEBUG: content has come as: this is content
对于1和2,它们是相同的.
但是对于3,则有所不同.
For 1 and 2, they are the same.
But for 3, it is different.
我的代码使用imaplib示例:
My code using imaplib sample:
typ, rfc = self.mail.fetch(num, '(RFC822)')
raw_email = rfc[0][1]
raw_email_to_utf8 = raw_email.decode('utf-8')
msg=email.message_from_string(raw_email_to_utf8)
content = msg.get_payload() #This is printed for the above debugging log.
因此,有些邮件发送的内容很奇怪.(主题再次编码正确)
Because of this, some mails are sent with wierd contents.(subjects are encoded well again)
为什么会有这种区别?如何获得不同解码的内容?
Why this difference, and how can I get the contents for differently decoded ones?
推荐答案
某些事情正在执行不必要的编码.这是不必要的,但不是禁止的.
Something is doing unnecessary encoding. That's unnecessary, but not prohibited.
RFC2047编码有时是必需的,但始终是合法的(因为允许它总是比制定精确的规则更简单).您必须检测RFC2047编码并在存在时对其进行解码.如果一个单词以=?开头,以?=结尾,并且恰好包含两个问号,则该单词将进行2047编码.有一些库或函数可以对大多数或所有语言进行解码,请搜索"rfc2047".
RFC2047 encoding is necessary sometimes, but legal always (because permitting it always was simpler then making precise rules). You have to detect RFC2047 encoding and decode it when present. If a word starts with =?, ends with ?= and contains precisely two question marks, then it is 2047-encoded. There are libraries or functions to decode for most or all languages, search for "rfc2047".
这篇关于为什么根据邮件客户端对相同内容进行不同的解码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!