Relational Database Design/Constraints

Constraints are ways to further constrain data above and beyond what Domains allow. Constraints are also used to enforce Referential Integrity, which is what prevents logically incomplete data from residing in the database.

Primary keys

A Primary Key is a Column that uniquely identifies a particular Row in a Table. For example, a person entity may have a Column for SSN (Social Security Number). If in your data model each person has a unique SSN, then it may be a candidate for a Primary Key. (Primary Keys can consist of two or more Columns, but this is not covered here.)

Primary Keys are also the means by which Foreign Keys work. Because of this, SSN may actually not be a good choice as a Primary Key. In practice, Rows often have a unique numeric identifier (often called an identity or sequence value) that uniquely identifies a particular Row. These kinds of values are often used as Primary Keys.

It should be noted that RDBMSes often use a Table's Primary Key's column(s) to automatically create a Structured Index on that Table. A Structured Index is an index that physically re-orders the data to match the index. This is done to improve query performance, but can actually hurt performance if the wrong column(s) are used as the Primary Key.

In relational theoretical terms, a primary key is a chosen Candidate key, a minimal set of attributes whose combination of instances in every row (tuple) is always unique and identifies the row (tuple). A candidate key is a minimal superkey, a superkey being any set of attributes (columns) which will identify the row , and the largest superkey is the entire set of columns of the table ( attributes of the relation).

Foreign keys

A Foreign Key is a way to further constrain the allowable values of a Column to data that exists in another Table. For example, if you have to process orders in your system, you may create a Table called OrderInfo to store order information. An order has to be associated with a customer, so you may have a Column in the OrderInfo Table called CustomerID that somehow connects to an associated Row in the Customer Table.

Most likely you do not want to be able to create orders for customers that do not exist, and you would not want to delete a customer that is associated with any orders. Doing so would break the Referential Integrity of the data. A Foreign Key relationship ensures that these two rules are enforced.

By creating a Foreign Key relationship between OrderInfo's CustomerID column and the Primary Key of the Customer Table, the RDBMS will ensure that CustomerID always refers to a single existing Row in the Customer Table, and will also prevent you from deleting that associated Row because one or more Rows in OrderInfo depend on it.

Usually, the table with the foreign key constraint is referring to another table by that table's primary key attribute(s). In a many-to-one relationship, for instance Orders is many, and Customer is one, there are many Order rows per Customer row, so the foreign key resides on the Order table. Customarily, the foreign key field names are the same as the primary key field name of the table being referred to, so it is probably a good idea to call the primary key on each table with redundant naming like "TABLENAME_ID" e.g. Customer_ID.

Other Constraints

It is arguable, that the most important constraints are foreign key and primary key constraints, because the process of normalization (see below), pushes most of the data integrity checking onto the primary keying and joining ( retrieving rows using a foreign key in one table, and a primary key table in another table).

Some DBMS provide a logical CHECK constraint, where the body of the CHECK involves some sort of condition on one or more fields .

NOT NULL and UNIQUE are constraints applied to individual fields in the data declaration statement CREATE TABLE ( f1 type1 PRIMARY KEY, f2 type2 UNIQUE , ... CHECK (..) )