More than a SpreadsheetEdit
Let's start with a simple example. Suppose we want to collect information about people - their name, place of birth and some more items. In the beginning we might consider to collect this data in a simple spreadsheet. But what if we grow to a successful company and have to handle millions of those data items? Could a spreadsheet deal with this huge amount of information? Could several employees or programs simultaneously insert new data, delete or change it? Of course not. And this is one of the noteworthy advantages of a DBMS over a spreadsheet program: we can imagine the structure of a table as a simple spreadsheet - but the access to it is internally organized in a way that huge amounts of data can be accessed by a lot of users at the same time.
In summary it can be said that one can imagine a table as a spreadsheet optimized for bulk data and concurrent access.
Conceive the StructureEdit
To keep control and to ensure a good performance, tables are subject to a few strict rules. Every table column has a fixed name and the values of each column must be of the same data type. Furthermore, it is highly recommended - though not compulsory - that each row can be identified by a unique value. The column, in which this identifying value resides, is called the Primary Key. In this Wikibook we always name it id. But everybody is free to choose a different name. Furthermore we may use the concatenation of more than one column as the Primary Key.
At the beginning we have to decide the following questions:
- What information units of persons (in this first example) do we want to save? Of course there is a lot of information about persons (e.g.: eye color, zodiacal sign, ...), but every application needs only some of them. We have to decide which ones are of interest in our concrete context.
- What names do we assign to the selected information units? Each of the identified information units goes to a column of the table, which needs to have a name.
- Of what data type are the information units? All data values within one column must be of the same data type. We cannot put an arbitrary string into a column of data type DATE.
In our example we decide to save first name, last name, date and place of birth, social security number, and the person's weight. Obviously date of birth is of data type DATE, the weight is a number and all others are some kind of strings. For strings there is a distinction between those that have a fixed length and those in which the length usually varies greatly from row to row. The former is named CHAR(<n>), where <n> is the fixed length, and the others VARCHAR(<n>), where <n> is the maximum length.
The decisions previously taken must be expressed in a machine-understandable language. This language is SQL, which acts as the interface between end users - or between special programms - and the DBMS.
-- comment lines starts with two consecutive minus signs followed by a space '-- ' CREATE TABLE person ( -- define columns (name / type / default value / nullable) id DECIMAL NOT NULL, firstname VARCHAR(50) NOT NULL, lastname VARCHAR(50) NOT NULL, date_of_birth DATE, place_of_birth VARCHAR(50), ssn CHAR(11), weight DECIMAL DEFAULT 0 NOT NULL, -- select one of the defined columns as the Primary Key and -- guess a meaningfull name for the Primary Key constraint: 'person_pk' may be a good choice CONSTRAINT person_pk PRIMARY KEY (id) );
We choose person as the name of the table, which consists of seven columns. One of them plays the role of the Primary Key: id. We can store exclusively digits in the column id and weight, strings in a length up to 50 characters in firstname, lastname and place_of_birth, dates in date_of_birth and a string of exactly eleven characters in ssn. The phrase NOT NULL is part of the definition of id, firstname, lastname and weight. This means that in every row there must be a value for those four columns. Storing no value in any of those columns is not possible - but the 8-character-string 'no value' or the digit '0' are allowed because they are values. Or to say it the other way round: it is possible to omit the values of date_of_birth, place_of_birth and ssn.
The definition of a Primary Key is called a 'constraint' (later on we will get to know more kinds of constraints). Every constraint should have a name - it's person_pk in this example.
After execution of the above 'CREATE TABLE' command the DBMS has created an object that one can imagine similar to the following Wiki-table:
id firstname lastname date_of_birth place_of_birth ssn weight
This Wiki-table shows 4 lines. The first line represents the names of the columns - no values! The following 3 lines are for demonstration purposes only. But in the database table exists currently no single row! She is completely empty, no rows at all, no values at all! The only thing that exists in the database is the structure of the table.
Back to StartEdit
Maybe we want to delete the table one day. To do so we can use the DROP command. It removes the table totally: all data and the complete structure are thrown away.
DROP TABLE person;
Don't confuse the DROP command with the DELETE command, which we present on the next page. The DELETE comand removes only rows - possibly all of them. However, the table itself, which holds the definition of the structure, keeps retained.