问题描述
我试图理解JAVA中的字符编码。 JAVA中的字符存储在16位,使用UTF-16编码。所以当我转换一个字符串包含6个字符到字节我得到6个字节如下,我期望它是12.有没有任何概念,我失踪?
I was trying to understand character encoding in JAVA. Characters in JAVA is being stored in 16 bit , with UTF-16 encoding. So while i am converting a string containing 6 character to byte i am getting 6 bytes as below, I am expecting it to be 12. Is there any concept i am missing ?
package learn.java;
public class CharacterTest {
public static void main(String[] args) {
String str = "Hadoop";
byte bt[] = str.getBytes();
System.out.println("the length of character array is " + bt.length);
}
}
O / p:字符数组的长度为6
O/p :the length of character array is 6
根据@Darshan尝试使用UTF-16编码获取字节时,结果也不会出现。
As per @Darshan When trying with UTF-16 encoding to get bytes the result is also not expecting .
package learn.java;
public class CharacterTest {
public static void main(String[] args) {
String str = "Hadoop";
try{
byte bt[] = str.getBytes("UTF-16");
System.out.println("the length of character array is " + bt.length);
}
catch(Exception e)
{
}
}
}
o/p: the length of character array is 14
推荐答案
UTF-16版本,你得到14个字节,因为插入一个标记,以区分大端(默认)和小端。如果指定UTF-16LE,您将获得12个字节(小端字节,不添加字节顺序标记)。
In the UTF-16 version, you get 14 bytes because of a marker inserted to distinguish between Big Endian (default) and Little Endian. If you specify UTF-16LE you will get 12 bytes (little-endian, no byte-order marker added).
请参阅
EDIT - 使用此程序查看由不同编码生成的实际字节:
EDIT - Use this program to look into the actual bytes generated by different encodings:
public class Test {
public static void main(String args[]) throws Exception {
// bytes in the first argument, encoded using second argument
byte[] bs = args[0].getBytes(args[1]);
System.err.println(bs.length + " bytes:");
// print hex values of bytes and (if printable), the char itself
char[] hex = "0123456789ABCDEF".toCharArray();
for (int i=0; i<bs.length; i++) {
int b = (bs[i] < 0) ? bs[i] + 256 : bs[i];
System.err.print(hex[b>>4] + "" + hex[b&0xf]
+ ( ! Character.isISOControl((char)b) ? ""+(char)b : ".")
+ ( (i%4 == 3) ? "\n" : " "));
}
System.err.println();
}
}
例如,在UTF-其他JVM默认编码,FE和FF的字符将显示不同),输出为:
For example, when running under UTF-8 (under other JVM default encodings, the characters for FE and FF would show up different), the output is:
$ javac Test.java && java -cp . Test hello UTF-16
12 bytes:
FEþ FFÿ 00. 68h
00. 65e 00. 6Cl
00. 6Cl 00. 6Fo
和
$ javac Test.java && java -cp . Test hello UTF-16LE
10 bytes:
60h 00. 65e 00.
64l 00. 64l 00.
67o 00.
这篇关于UTF-16字符编码java的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!