问题描述
我会定期收到包含文本部分和文本附件的生成的电子邮件.我想测试附件是否为base64编码,然后将其解码为:
I regularly receive a generated email message containing a text part and a text attachment. I want to test if attachment is base64 encoded, then decode it like:
:0B
* ^(Content-Transfer-Encoding: *base64(($)[a-z0-9].*)*($))
{
msgID=`printf '%s' "$MATCH" | base64 -d`
}
但是它总是说输入无效,有人知道这是怎么回事吗?
But it always say invalid input, anyone know what's wrong?
procmail: Match on "^()\/[a-z]+[0-9]+[^\+]"
procmail: Assigning "msgID=PGh0b"
procmail: matched "^(Content-Disposition: *attachment.*(($)[a-z0-9].*)* |Content-Transfer-Encoding: *base64(($)[a-z0-9].*)*($)"
procmail: Executing "printf '%s' "$MATCH" | base64 -d"
base64: invalid input
procmail: Assigning "msgID=<ht"
procmail: Unexpected EOL
procmail: Assigning "msgID=PGh0b"
procmail: Match on "^(Content-Transfer-Encoding: *base64(($)[a-z0-9].*)*($))"
procmail: Executing "printf '%s' "$MATCH" | base64 -d"
base64: invalid input
procmail: Assigning "msgID=<ht"
procmail: Unexpected EOL
推荐答案
如果您的要求很复杂,编写专用的脚本提取所需的信息可能会更容易-一种具有适当MIME支持的现代脚本语言正在发展在涉及现代MIME电子邮件中内容编码和身体部位结构的各种不同可能性时,它的用途更加广泛.
If your requirements are complex, it might be easier to write a dedicated script which extracts the information you want -- a modern scripting language with proper MIME support is going to be a lot more versatile when it comes to all the myriad different possibilities for content encoding and body part structure in modern MIME email.
以下内容使用Content-Disposition: attachment
查找MIME头的首次出现,并提取以下正文的第一个标记.如果您与使用定义明确的静态模板的发件人相对应,此 可能会做您想要的事情.这里没有真正的MIME解析,因此(例如)转发的消息恰好包含与模式匹配的嵌入式部分也将触发条件. (这可能是错误或功能.)
The following finds the first occurrence of MIME headers with Content-Disposition: attachment
and extracts the first token of the following body. This might do what you want if you are corresponding with a sender who uses a well-defined, static template. There is no real MIME parsing here, so (say) a forwarded message which happens to contain an embedded part which matches the pattern will also trigger the conditions. (This can be a bug, or a feature.)
Procmail的一个有用但不经常使用的功能是能够编写跨越多行的正则表达式.在正则表达式中,($)
始终与文字换行符匹配.因此,我们可以查找Content-Disposition: attachment
标头,然后查找其他标头(零个或多个),后跟空白行,然后是要提取的令牌.
A useful but not frequently used feature of Procmail is the ability to write a regular expression which spans multiple lines. Within a regex, ($)
always matches a literal newline. So with that, we can look for a Content-Disposition: attachment
header followed by other headers (zero or more) followed by an empty line, followed by the token you want to extract.
:0B
* ^Content-Disposition: *attachment.*(($)[A-Z].*)*($)($)\/[A-Z]+[0-9]+
{ msgid="$MATCH" }
为简单起见,我没有尝试处理多行MIME标头.如果您要支持这一点,则修复应该相当明显,尽管一点也不优雅.
For simplicity, I have not attempted to cope with multi-line MIME headers. If you want to support that, the fix should be reasonably obvious, though not at all elegant.
在更一般的情况下,您可能想添加一个条件以检查条件中的MIME标头组还包含Content-type: text/plain
;您将需要设置两个替代方法,以使Content-disposition:
在Content-disposition:
之前或之后(或在以某种方式对MIME标头进行标准化之前;或相信发送方始终按照示例消息中的顺序生成它们).
In the somewhat more general case, you might want to add a condition to check that the group of MIME headers in the condition also contains a Content-type: text/plain
; you will need to set up two alternatives for having Content-type:
before or after Content-disposition:
(or somehow normalize the MIME headers before getting to this recipe; or trust that the sender always generates them in exactly the order in the sample message).
这篇关于从“内容处置:附件"中提取文本.身体的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!