本文介绍了UTF-16字符编码java的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图理解JAVA中的字符编码。 JAVA中的字符存储在16位,使用UTF-16编码。所以当我转换一个字符串包含6个字符到字节我得到6个字节如下,我期望它是12.有没有任何概念,我失踪?

I was trying to understand character encoding in JAVA. Characters in JAVA is being stored in 16 bit , with UTF-16 encoding. So while i am converting a string containing 6 character to byte i am getting 6 bytes as below, I am expecting it to be 12. Is there any concept i am missing ?

package learn.java;

public class CharacterTest {

    public static void main(String[] args) {
        String str = "Hadoop";
        byte bt[] = str.getBytes();
        System.out.println("the length of character array is " + bt.length);
    }
}

O / p:字符数组的长度为6

O/p :the length of character array is 6

根据@Darshan尝试使用UTF-16编码获取字节时,结果也不会出现。

As per @Darshan When trying with UTF-16 encoding to get bytes the result is also not expecting .

package learn.java;

    public class CharacterTest {

        public static void main(String[] args) {

            String str = "Hadoop";
            try{
                byte bt[] = str.getBytes("UTF-16");
                System.out.println("the length of character array is " + bt.length);

            }
            catch(Exception e)
            {

            }
        }
    }

o/p: the length of character array is 14


推荐答案

UTF-16版本,你得到14个字节,因为插入一个标记,以区分大端(默认)和小端。如果指定UTF-16LE,您将获得12个字节(小端字节,不添加字节顺序标记)。

In the UTF-16 version, you get 14 bytes because of a marker inserted to distinguish between Big Endian (default) and Little Endian. If you specify UTF-16LE you will get 12 bytes (little-endian, no byte-order marker added).

请参阅

EDIT - 使用此程序查看由不同编码生成的实际字节:

EDIT - Use this program to look into the actual bytes generated by different encodings:

public class Test {
    public static void main(String args[]) throws Exception {
        // bytes in the first argument, encoded using second argument
        byte[] bs = args[0].getBytes(args[1]);
        System.err.println(bs.length + " bytes:");

        // print hex values of bytes and (if printable), the char itself
        char[] hex = "0123456789ABCDEF".toCharArray();
        for (int i=0; i<bs.length; i++) {
            int b = (bs[i] < 0) ? bs[i] + 256 : bs[i];
            System.err.print(hex[b>>4] + "" + hex[b&0xf]
                + ( ! Character.isISOControl((char)b) ? ""+(char)b : ".")
                + ( (i%4 == 3) ? "\n" : " "));
        }
        System.err.println();
    }
}

例如,在UTF-其他JVM默认编码,FE和FF的字符将显示不同),输出为:

For example, when running under UTF-8 (under other JVM default encodings, the characters for FE and FF would show up different), the output is:

$ javac Test.java  && java -cp . Test hello UTF-16
12 bytes:
FEþ FFÿ 00. 68h
00. 65e 00. 6Cl
00. 6Cl 00. 6Fo

$ javac Test.java  && java -cp . Test hello UTF-16LE
10 bytes:
60h 00. 65e 00.
64l 00. 64l 00.
67o 00.

这篇关于UTF-16字符编码java的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 20:07