Unicode
Navigate Language Fundamentals topic: ) |
Most Java program text consists of ASCII characters, but any Unicode character can be used as part of identifier names, in comments, and in character and string literals. For example, π (which is the Greek Lowercase Letter pi) is a valid Java identifier:
Code section 3.100: Pi.
double π = Math.PI;
|
and in a string literal:
Code section 3.101: Pi literal.
String pi = "π";
|
Unicode escape sequences
editUnicode characters can also be expressed through Unicode Escape Sequences. Unicode escape sequences may appear anywhere in a Java source file (including inside identifiers, comments, and string literals).
Unicode escape sequences consist of
- a backslash '
\
' (ASCII character 92, hex 0x5c), - a '
u
' (ASCII 117, hex 0x75) - optionally one or more additional '
u
' characters, and - four hexadecimal digits (the characters '
0
' through '9
' or 'a
' through 'f
' or 'A
' through 'F
').
Such sequences represent the UTF-16 encoding of a Unicode character. For example, 'a' is equivalent to '\u0061'. This escape method does not support characters beyond U+FFFF or you have to make use of surrogate pairs.[1]
Any and all characters in a program may be expressed in Unicode escape characters, but such programs are not very readable, except by the Java compiler - in addition, they are not very compact.
One can find a full list of the characters here.
π may also be represented in Java as the Unicode escape sequence \u03C0
. Thus, the following is a valid, but not very readable, declaration and assignment:
Code section 3.102: Unicode escape sequences for Pi.
double \u03C0 = Math.PI;
|
The following demonstrates the use of Unicode escape sequences in other Java syntax:
Code section 3.103: Unicode escape sequences in a string literal.
// Declare Strings pi and quote which contain \u03C0 and \u0027 respectively:
String pi = "\u03C0";
String quote = "\u0027";
|
Note that a Unicode escape sequence functions just like any other character in the source code. E.g., \u0022
(double quote, ") needs to be quoted in a string just like ".
Code section 3.104: Double quote.
// Declare Strings doubleQuote1 and doubleQuote2 which both contain " (double quote):
String doubleQuote1 = "\"";
String doubleQuote2 = "\\u0022"; // "\u0022" doesn't work since """ doesn't work.
|
International language support
editThe language distinguishes between bytes and characters. Characters are stored internally using UCS-2, although as of J2SE 5.0, the language also supports using UTF-16 and its surrogates. Java program source may therefore contain any Unicode character.
The following is thus perfectly valid Java code; it contains Chinese characters in the class and variable names as well as in a string literal:
Code listing 3.50: 哈嘍世界.java
public class 哈嘍世界 {
private String 文本 = "哈嘍世界";
}
|
References
edit