← Here documents | Encoding →


To us, a string such as "Hello world" looks like a series of letters with a space in the middle. To your computer, however, every String – in fact, everything – is a series of numbers.

ASCII

edit

In our example, each character of the String "Hello world" is represented by a number between 0 and 127. For example, to the computer, the capital letter "H" is encoded as the number 72, whereas the space is encoded as the number 32. The ASCII standard, originally developed for sending telegraphs, specifies what number is used to represent each character.

On most Unix-like operating systems, you can view the entire chart of ASCII codes by typing "man ascii" at the shell prompt. Wikipedia's page on ASCII also lists the ASCII codes. Using an ASCII chart, we discover that our string "Hello world" gets converted into the following series of ASCII codes.

H  e   l   l   o   space  w   o   r   l   d
72 101 108 108 111 32     119 111 114 108 100

You can also determine the ASCII code of a character by using the ? operator in Ruby 1.8.

puts ?H 
puts ?e
puts ?l
puts ?l
puts ?o

The question-mark syntax no longer works in Ruby 1.9. Instead, use the ord method.

puts "H".ord
puts "e".ord
puts "l".ord
puts "l".ord
puts "o".ord

Notice that the output (below) of this program matches the ASCII codes for the "Hello" part of "Hello world".

$ hello-ascii.rb
72
101
108
108
111

To get the ASCII value for a space, we need to use its escape sequence. In fact, we can use any escape sequence with the ? operator.

puts ?\s
puts ?\t
puts ?\b
puts ?\a

As above in Ruby >= 1.9 use

puts "\s".ord
puts "\t".ord
puts "\b".ord
puts "\a".ord

instead.

The result:

32
9
8
7

Terminal emulators

edit

You may not realize it, but so far, you've been running your Ruby programs inside of a program called a terminal emulator – such as the Microsoft Windows console, the Mac OS X Terminal application, a telnet client, rxvt, or X Window System programs such as xterm.

When your Ruby program prints out the letter "H", it sends the ASCII code for "H" (72) to the terminal emulator, which then draws an "H". When your Ruby program prints out a bell character, it sends a different ASCII code – ASCII code 7 – to the terminal emulator. In this case, the terminal emulator does not draw a symbol, but instead will typically beep or flash briefly. How each of the codes gets interpreted is largely determined by the ASCII standard.

Other character encodings

edit

The ASCII standard is a type of character encoding. As mentioned above, ASCII only uses numbers 0 through 127 to define characters. There's a lot more characters than that in the world. Other character encoding systems – such as Latin-1, Shift_JIS, and the Unicode Transformation Format (UTF) – have been created to represent a wider variety of characters, including those found in languages such as Arabic, Hebrew, Chinese, and Japanese.

ASCII chart

edit
Binary Oct Dec Hex Glyph
010 0000 040 32 20 ?
010 0001 041 33 21 !
010 0010 042 34 22 "
010 0011 043 35 23 #
010 0100 044 36 24 $
010 0101 045 37 25 %
010 0110 046 38 26 &
010 0111 047 39 27 '
010 1000 050 40 28 (
010 1001 051 41 29 )
010 1010 052 42 2A *
010 1011 053 43 2B +
010 1100 054 44 2C ,
010 1101 055 45 2D -
010 1110 056 46 2E .
010 1111 057 47 2F /
011 0000 060 48 30 0
011 0001 061 49 31 1
011 0010 062 50 32 2
011 0011 063 51 33 3
011 0100 064 52 34 4
011 0101 065 53 35 5
011 0110 066 54 36 6
011 0111 067 55 37 7
011 1000 070 56 38 8
011 1001 071 57 39 9
011 1010 072 58 3A :
011 1011 073 59 3B ;
011 1100 074 60 3C <
011 1101 075 61 3D =
011 1110 076 62 3E >
011 1111 077 63 3F ?
Binary Oct Dec Hex Glyph
100 0000 100 64 40 @
100 0001 101 65 41 A
100 0010 102 66 42 B
100 0011 103 67 43 C
100 0100 104 68 44 D
100 0101 105 69 45 E
100 0110 106 70 46 F
100 0111 107 71 47 G
100 1000 110 72 48 H
100 1001 111 73 49 I
100 1010 112 74 4A J
100 1011 113 75 4B K
100 1100 114 76 4C L
100 1101 115 77 4D M
100 1110 116 78 4E N
100 1111 117 79 4F O
101 0000 120 80 50 P
101 0001 121 81 51 Q
101 0010 122 82 52 R
101 0011 123 83 53 S
101 0100 124 84 54 T
101 0101 125 85 55 U
101 0110 126 86 56 V
101 0111 127 87 57 W
101 1000 130 88 58 X
101 1001 131 89 59 Y
101 1010 132 90 5A Z
101 1011 133 91 5B [
101 1100 134 92 5C \
101 1101 135 93 5D ]
101 1110 136 94 5E ^
101 1111 137 95 5F _
Binary Oct Dec Hex Glyph
110 0000 140 96 60 `
110 0001 141 97 61 a
110 0010 142 98 62 b
110 0011 143 99 63 c
110 0100 144 100 64 d
110 0101 145 101 65 e
110 0110 146 102 66 f
110 0111 147 103 67 g
110 1000 150 104 68 h
110 1001 151 105 69 i
110 1010 152 106 6A j
110 1011 153 107 6B k
110 1100 154 108 6C l
110 1101 155 109 6D m
110 1110 156 110 6E n
110 1111 157 111 6F o
111 0000 160 112 70 p
111 0001 161 113 71 q
111 0010 162 114 72 r
111 0011 163 115 73 s
111 0100 164 116 74 t
111 0101 165 117 75 u
111 0110 166 118 76 v
111 0111 167 119 77 w
111 1000 170 120 78 x
111 1001 171 121 79 y
111 1010 172 122 7A z
111 1011 173 123 7B {
111 1100 174 124 7C |
111 1101 175 125 7D }
111 1110 176 126 7E ~