本文介绍了Java InputStream编码/字符集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

运行以下(示例)代码

import java.io.*;

public class test {
    public static void main(String[] args) throws Exception {
        byte[] buf = {-27};
        InputStream is = new ByteArrayInputStream(buf);
        BufferedReader r = new BufferedReader(
                new InputStreamReader(is, "ISO-8859-1"));
        String s = r.readLine();
        System.out.println("test.java:9 [byte] (char)" + (char)s.getBytes()[0] +
                " (int)" + (int)s.getBytes()[0]);
        System.out.println("test.java:10 [char] (char)" + (char)s.charAt(0) +
                " (int)" + (int)s.charAt(0));
        System.out.println("test.java:11 string below");
        System.out.println(s);
        System.out.println("test.java:13 string above");
    }
}

给我这个输出


test.java:9 [byte] (char)? (int)63
test.java:10 [char] (char)? (int)229
test.java:11 string below
?
test.java:13 string above

如何在第9行保留正确的字节值(-27)打印?并因此收到 System.out.println(s)命令(å)的预期输出。

How do I retain the correct byte value (-27) in the line-9 printout? And consequently receive the expected output of the System.out.println(s) command (å).

推荐答案

如果要保留字节值,最好不要使用阅读器。为了在文本中表示任意的二进制数据,稍后将其转换回二进制数据,您应该使用base16或base64编码。

If you want to retain byte values, don't use a Reader at all, ideally. To represent arbitrary binary data in text and convert it back to binary data later, you should use base16 or base64 encoding.

然而,为了解释发生了什么,当您调用 s.getBytes()使用默认字符编码,这显然不包括Unicode字符U + 00E5。

However, to explain what's going on, when you call s.getBytes() that's using the default character encoding, which apparently doesn't include Unicode character U+00E5.

如果您调用 s.getBytes(ISO-8859-1),而不是 s.getBytes( )我怀疑你会得到正确的字节值...但依靠ISO-8859-1这样做是有点脏的IMO。

If you call s.getBytes("ISO-8859-1") everywhere instead of s.getBytes() I suspect you'll get back the right byte value... but relying on ISO-8859-1 for this is kinda dirty IMO.

这篇关于Java InputStream编码/字符集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-18 22:36