问题描述
我有一个包含的文字。以下是这样一个文本的示例(来自):
I have a text with quoted-printables. Here is an example of such a text (from a wikipedia article):
我正在寻找一个Java类,它将编码形式解码为字符,例如 = 20 空间。
I am looking for a Java class, which decode the encoded form to chars, e.g., =20 to a space.
更新:感谢精英绅士,我知道我需要使用QuotedPrintableCodec:
UPDATE: Thanks to The Elite Gentleman, I know that I need to use QuotedPrintableCodec:
import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.net.QuotedPrintableCodec;
import org.junit.Test;
public class QuotedPrintableCodecTest {
private static final String TXT = "If you believe that truth=3Dbeauty, then surely=20=mathematics is the most beautiful branch of philosophy.";
@Test
public void processSimpleText() throws DecoderException
{
QuotedPrintableCodec.decodeQuotedPrintable( TXT.getBytes() );
}
}
然而,我继续收到以下异常:
However I keep getting the following exception:
org.apache.commons.codec.DecoderException: Invalid URL encoding: not a valid digit (radix 16): 109
at org.apache.commons.codec.net.Utils.digit16(Utils.java:44)
at org.apache.commons.codec.net.QuotedPrintableCodec.decodeQuotedPrintable(QuotedPrintableCodec.java:186)
我做错了什么?
更新2:我有发现,并了解:
UPDATE 2: I have found this question @ SO and learn about MimeUtility:
import javax.mail.MessagingException;
import javax.mail.internet.MimeUtility;
public class QuotedPrintableCodecTest {
private static final String TXT = "If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.";
@Test
public void processSimpleText() throws MessagingException, IOException
{
InputStream is = new ByteArrayInputStream(TXT.getBytes());
BufferedReader br = new BufferedReader ( new InputStreamReader( MimeUtility.decode(is, "quoted-printable") ));
StringWriter writer = new StringWriter();
String line;
while( (line = br.readLine() ) != null )
{
writer.append(line);
}
System.out.println("INPUT: " + TXT);
System.out.println("OUTPUT: " + writer.toString() );
}
}
然而输出仍然不完美, =':
However the output still is not perfect, it contains '=' :
INPUT: If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.
OUTPUT: If you believe that truth=beauty, then surely = mathematics is the most beautiful branch of philosophy.
现在我做错了什么?
推荐答案
类是RFC 1521引用可打印部分的实现。
Apache Commons Codec QuotedPrintableCodec class does is the implementation of the RFC 1521 Quoted-Printable section.
更新,您可引用的可打印字符串错误,维基百科的示例使用软线路断开。
Update, Your quoted-printable string is wrong, as the example on Wikipedia uses Soft-line breaks.
换行符:
Rule #5 (Soft Line Breaks): The Quoted-Printable encoding REQUIRES
that encoded lines be no more than 76 characters long. If longer
lines are to be encoded with the Quoted-Printable encoding, 'soft'
line breaks must be used. An equal sign as the last character on a
encoded line indicates such a non-significant ('soft') line break
in the encoded text. Thus if the "raw" form of the line is a
single unencoded line that says:
Now's the time for all folk to come to the aid of
their country.
This can be represented, in the Quoted-Printable encoding, as
Now's the time =
for all folk to come=
to the aid of their country.
This provides a mechanism with which long lines are encoded in
such a way as to be restored by the user agent. The 76 character
limit does not count the trailing CRLF, but counts all other
characters, including any equal signs.
所以你的文字应该如下:
So your text should be made as follows:
private static final String CRLF = "\r\n";
private static final String S = "If you believe that truth=3Dbeauty, then surely=20=" + CRLF + "mathematics is the most beautiful branch of philosophy.";
Javadoc明确指出:
The Javadoc clearly states:
还有一个对于Apache QuotedPrintableCodec,因为它不支持软线路断开。
And there is a bug logged for Apache QuotedPrintableCodec as it doesn't support the soft-line breaks.
这篇关于如何解码可引用的字符(从引用到char)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!