本文介绍了Java charAt用于具有两个代码单元的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自 Core Java ,第一卷。 1,第9版,p。 69:

From Core Java, vol. 1, 9th ed., p. 69:

String sentence = "ℤ is the set of integers"; // for clarity; not in book
char ch = sentence.charAt(1)

不返回空格但是第二个代码单位ℤ。

doesn't return a space but the second code unit of ℤ.

但似乎 sentence.charAt(1) 返回一个空格。例如,以下代码中的 if 语句的计算结果为 true

But it seems that sentence.charAt(1) does return a space. For example, the if statement in the following code evaluates to true.

String sentence = "ℤ is the set of integers";
if (sentence.charAt(1) == ' ')
    System.out.println("sentence.charAt(1) returns a space");

为什么?

我正在使用JDK SE 1.7.0_09在Ubuntu 12.10上,如果它是相关的。

I'm using JDK SE 1.7.0_09 on Ubuntu 12.10, if it's relevant.

推荐答案

听起来这本书说'ℤ'是不是中的UTF-16字符,但实际上它是。

It sounds like tho book is saying that 'ℤ' is not a UTF-16 character in the basic multilingual plane, but in fact it is.

对于不在基本多语言平面中的字符,Java使用带有代理项对的UTF-16。由于'ℤ'(0x2124)在基本多语言平面中,因此它由单个代码单元表示。在您的示例中 sentence.charAt(0)将返回'ℤ',而 sentence.charAt(1)将返回''。

Java uses UTF-16 with surrogate pairs for characters that are not in the basic multilingual plane. Since 'ℤ' (0x2124) is in the basic multilingual plane it is represented by a single code unit. In your example sentence.charAt(0) will return 'ℤ', and sentence.charAt(1) will return ' '.

由代理对代表的字符有两个代码单元组成字符。 sentence.charAt(0)将返回第一个代码单元, sentence.charAt(1)将返回第二个代码单元。

A character represented by surrogate pairs has two code units making up the character. sentence.charAt(0) would return the first code unit, and sentence.charAt(1) would return the second code unit.

参见:

这篇关于Java charAt用于具有两个代码单元的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-17 04:36