How does a computer understand text? edit

Every word is made up of symbols or characters. When you press a key on a keyboard, a number is generated that represents the symbol for that key. This is called a character code. A complete collection of characters is a character set.

ASCII edit

ASCII code represents alphanumeric data in most computers. It stands for “American Standard Code for Information Interchange” It works like any other code. One thing represents another. In ASCII, binary is used to represent our numbers, letters and symbols. At a very basic level we use ASCII because computers store all information in binary. Therefore we need some way to encode numbers, letters and symbols in binary. At first many different character sets were used, so on one system the code 0100001 would represent A, but on another it could be P, or Y! Obviously this situation was far from ideal. A standard character set was needed so that consistency between systems could be achieved.This is where ASCII came in. It included codes for:

  • All the main alphabetic characters, upper and lower case
  • All the numeric symbols, 0-9
  • 32 punctuation and other symbols, and ‘space’
  • 32 codes reserved for non-printable control codes

How many codes in total? Each character (or command) is represented as a number from 0 to 127. This is stored as a binary value in 7 bits. 1 bit is reserved for error checking.

Try this experiment: Open up a new file in Notepad and insert the sentence, "Four score and seven years ago" in it. Save the file to disk under the name getty.txt. Then use the explorer and look at the size of the file. You will find that the file has a size of 30 bytes on disk: 1 byte for each character. If you add another word to the end of the sentence and re-save it, the file size will jump to the appropriate number of bytes. Each character consumes a byte.


Decode some ASCII! edit

Convert the binary into denary, then use the ASCII table below to find the letter that corresponds with the denary number.

 
ASCII-Table

If you want to learn about binary conversion you can head here. If you know binary, start converting!

* What kind of guns do bees use? 01000010 01100101 01100101 01000010 01100101 01100101 00100000 01100111 01110101 01101110 01110011

* What do you call a bee who is having a bad hair day? 01000001 00100000 01000110 01010010 01001001 01010011 01000010 01000101 01000101 00100001

* Why do bees have sticky hair? 01001000 01101111 01101110 01100101 01111001 00101101 01100011 01101111 01101101 01100010 01110011 00100001

* What do you get when you cross a sheep and a bee? 01000001 00100000 01100010 01100001 01101000 00101101 01101000 01110101 01101101 01100010 01110101 01100111 00101110

Unicode edit

ASCII code can only store 128 characters, which is enough for most words in English but not enough for other languages. If you want to use accents in European languages or larger alphabets such as Cyrillic (the Russian alphabet) and Chinese Mandarin then more characters are needed. Therefore another code, called Unicode, was created. Unicode uses between 8 and 32 bits per character, so it can represent characters from languages from all around the world. It is commonly used across the internet. As it is larger than ASCII, it might take up more storage space when saving documents. Global companies, like Facebook and Google, would not use the ASCII character set because their users communicate in many different languages.