Relational Database Design/Basic Concepts

Set theory

The theory of relational databases is built upon the mathematical theory of sets. In mathematics, a set is one of the most general ways of considering more than one object at once, and a substantial amount of theory has been built up to describe this. You won't need to know too much set theory to understand relational databases, but it helps to have the basic concepts straight.

A set is simply a collection of zero or more objects, all of which are different, without any further structure. The following are all sets:

$\{1,2,3\}\qquad \{a,b\}\qquad \{-371,r,4.5\}$

Note that the definition says zero or more: a set with zero members is still a set, even though it is empty. The set with zero elements is written as a pair of empty braces $\{\}$ , and is often represented by the symbol $\emptyset$ .

If two sets have the exact same elements, then they are the same set. There's nothing special about one set that can distinguish it from others, apart from the elements it contains. The order of the elements isn't important, so the sets are the same no matter what order we choose to write the elements in:

$\{1,2,3\}=\{3,2,1\}$

Anything can be put into a set, not just mathematical concepts such as numbers. You could have the set of all US presidents, and deal with it in set theory the same as any other set:

$S=\{{\text{George Washington}},{\text{John Adams}},...,{\text{Barack Obama}}\}$

Even sets can be members of sets:

$S=\{1,\{red,orange,yellow\},42\}$

Sets can be infinite, for example the set of all positive whole numbers.

Names and objects

Databases are useful when they tell us something about the real world. But things in the real world are dealt with in a fuzzy way that can trip us up when we translate them into a database model.

Names are a good example. When we need to refer to something, we'll give it a name. People don't often think about it, but the name isn't the same thing as the object. "Mark Twain" is a string of characters that we usually use to refer to the author Mark Twain, but the two things aren't the same: "Mark Twain" consists of ten characters, but it's not correct to say that Mark Twain consists of ten characters.

It's also possible for the same object to have several names. "Mark Twain" and "Samuel Clemens" are both strings that can be understood to refer to the author of The Adventures of Tom Sawyer.

This becomes important because you can't put real-world objects into a database, only pieces of information that identify them. You would like to choose identifying information that is unique for each object and can't be confused, but often this is more difficult than it first appears. If your means of identifying objects is fuzzy, then regardless of how rigorous your database is, the conclusions you draw will still be fuzzy.

Entities and Classes

Databases refer to objects in the real world, but not always in the same way. A database recording sales for a car dealership may track the individual cars that are sold, so that we know exactly which car has been sold to which customer. The database that runs the manufacturer's web site, on the other hand, probably deals with facts about the model lines as a whole, i.e. about the whole class of such cars.

This is important to be aware of because language doesn't always make it plain. "Joe Smith bought a Ford Mondeo yesterday" uses the same language as "a Ford Mondeo has four wheels", but the former refers to an individual car, while the latter is a general statement about every Ford Mondeo. The latter statement may even be true if there are no Ford Mondeos in existence (e.g. after the model had been designed, but before one had been manufactured). Don't let the language confuse you, these are entirely different from the point of view of database design.

Terms

There are two sets of relational database terminology in use. The original developers of the relational theory approached it from a theoretical perspective and used terminology that came from set theory and formal logic. These terms never caught on among practitioners, who prefered to use more intuitive and practical terms, which were eventually enshrined in the SQL standard.

Relational term	SQL equivalent
relation, base relvar	table
derived relvar	view, query result, result set
tuple	row
attribute	column

Most modern publications about databases use the SQL terms as described below, but you should be aware of the different terminology. The pairs of terms are not entirely synonymous, so some writers on relational theory (e.g. Chris Date and Fabian Pascal) prefer to use the strict relational terminology.

Domains

Domains are the set of allowable data values for a Column. For example, the FiveDigitZipCode Column on the customer entity can be in the integer domain. As such, the database would not allow you to place values like 123.45 (floating point) or ABC (character) into that Column.

Some authors draw a distinction between a domain and a type in the fact that a type is a fundamental concept built into the Database Management System (DBMS) (e.g. string, integer, floating point) while a domain can have additional business rules about what values are acceptable. For example, if you have a database storing scores in ten-pin bowling, the score for a game will be of integer type, but the rules of the game (it's impossible to score more than 300) mean that the domain of the score would be integers between 0 and 300. The additional constraints on the domain make it harder for bad data to be inserted into the database.

Columns

Columns are the attributes that describe an entity in the database model. For example, the customer entity may have attributes for First Name, Last Name, Address, City, State, and FiveDigitZipCode.

Rows

A Row is a complete set of Columns that describe the entity that you are trying to model.

Tables

Tables are collections of Rows that act as a logical groupings of entities.

Databases

A collection of related Tables and any supporting objects (e.g. stored procedures) is often referred to as a Database (or schema). Multiple Databases are usually logically separate from one another.

The term 'database' is sometimes used loosely to refer to the software that manages the database. To avoid ambiguity it is standard in more formal contexts to refer to the software as a Database Management System or DBMS—or more specifically a Relational Database Management System or RDBMS.

Arity

Arity refers to the number of columns in a table. If a table has five columns (as in the case of the customer entity above), we say it is of arity 5.

Cardinality

Cardinality refers to the number of elements in a set. In a relational database context, this usually means the number of rows in a table (since a table can be viewed as a set of rows). The cardinality of a table is based on the rows in a table. For example, an Employee table may only have one employee row per employee the cardinality is one. The employee may have 3 phones, cell, work and home. The phone table would have a cardinality of 3 for that employee's phones.