问题描述
运行以下(示例)代码
import java.io.*;
public class test {
public static void main(String[] args) throws Exception {
byte[] buf = {-27};
InputStream is = new ByteArrayInputStream(buf);
BufferedReader r = new BufferedReader(
new InputStreamReader(is, "ISO-8859-1"));
String s = r.readLine();
System.out.println("test.java:9 [byte] (char)" + (char)s.getBytes()[0] +
" (int)" + (int)s.getBytes()[0]);
System.out.println("test.java:10 [char] (char)" + (char)s.charAt(0) +
" (int)" + (int)s.charAt(0));
System.out.println("test.java:11 string below");
System.out.println(s);
System.out.println("test.java:13 string above");
}
}
给我这个输出
test.java:9 [byte] (char)? (int)63
test.java:10 [char] (char)? (int)229
test.java:11 string below
?
test.java:13 string above
如何在第9行保留正确的字节值(-27)打印?并因此收到 System.out.println(s)
命令(å)的预期输出。
How do I retain the correct byte value (-27) in the line-9 printout? And consequently receive the expected output of the System.out.println(s)
command (å).
推荐答案
如果要保留字节值,最好不要使用阅读器。为了在文本中表示任意的二进制数据,稍后将其转换回二进制数据,您应该使用base16或base64编码。
If you want to retain byte values, don't use a Reader at all, ideally. To represent arbitrary binary data in text and convert it back to binary data later, you should use base16 or base64 encoding.
然而,为了解释发生了什么,当您调用 s.getBytes()
使用默认字符编码,这显然不包括Unicode字符U + 00E5。
However, to explain what's going on, when you call s.getBytes()
that's using the default character encoding, which apparently doesn't include Unicode character U+00E5.
如果您调用 s.getBytes(ISO-8859-1)
,而不是 s.getBytes( )
我怀疑你会得到正确的字节值...但依靠ISO-8859-1这样做是有点脏的IMO。
If you call s.getBytes("ISO-8859-1")
everywhere instead of s.getBytes()
I suspect you'll get back the right byte value... but relying on ISO-8859-1 for this is kinda dirty IMO.
这篇关于Java InputStream编码/字符集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!