Relational Database Design/Introduction

What is a database?

edit

Computer programs operate on data to produce results. A program can store this data in its memory while running, but often we need data to last beyond the lifetime of the program or share the data between more than one run of the program. Programmers could write new code for data storage with every new program they write, but this would lead to duplicated effort since much of the code would be the same each time. At its broadest, database technology concerns general solutions to the problems of storing, managing and retrieving data.

In terms of computer science, we can simply say that a database is a collection of coherent information. While an information is a processed form of data. Databases are mainly concerned with structured data: data that is stored according to some kind of regular pattern enabling a computer to filter and process it, and infer answers to questions that are not explicitly set out in the data as originally input. A program that can store information and return it unmodified (such as a file system) would not normally be referred to as a database.

The most general way of thinking about a database is as a collection of facts about the world. You want to know whether a fact is true (e.g. "Is Jim the account manager for MultiWidget Corp?") or what the missing object is in a fact (e.g. "What is the capital of Uganda?"). You can transform your question into a form appropriate to your database, then the database can give you an answer. Databases vary a lot in how you state the question and which questions can be answered efficiently, but at some abstract level it always follows this pattern.

What is a data model?

edit

Databases aim to be general solutions to the problem of data management, so as a designer of database software you want the system to be able to be used in as many situations as possible without requiring customisation to your clients' application. However, facts about the world in general take all sorts of different forms. How can the same piece of software work with everything from human relationships to industrial processes without being massively complex?

The solution lies in a data model: a set of rules, assumptions and conventions that assist in transforming loose concepts in the real world into a finite and objective model in the computer by means of a fixed number of reusable concepts. The relational model is one of the most widely understood and commercially successful data models, but it is not the only one.

What is a relational database?

edit

A relational database is a database that organizes its data into collections of Tables, Rows, Attributes, and Domains. Predicate logic is used to both describe the information contained in the database and to query information from it. (See Wikipedia: Relational model)

A word about formality

edit

We can talk about databases on several different levels, from very intuitive descriptions right down to precise mathematical formalism. When talking about databases there is often a tension between writing in a way that is comprehensible and writing in a way that is formally correct. In this book we will try to remain true to the theory where possible, but not at the expense of making the ideas difficult to understand. Where necessary, this will involve a looser intuitive discussion of a topic (which may contain elements that are technically incorrect) to establish a level of understanding before treating the topic more formally.

A similar tension exists in choosing examples to illustrate particular points. There are many rules about what constitutes good database design, but often following the rules serves to complicate the example and make it harder to understand. Within this book we will focus on examples that illustrate well the point under discussion, and trust the reader to understand that design choices in the examples that are incidental to the topic in question are not necessarily to be followed.

The relational model and SQL implementations

edit

The relational model has been by far the most widely implemented data model, but the vast majority of implementations are based around Structured Query Language (SQL). The SQL standard fails to properly follow the relational data model in several important respects, for example in allowing duplicate rows to appear in tables. Since there are no commercial-grade database tools that use the relational model in its pure form without SQL, many books treat the relational model and SQL as inseparable and gloss over the tension between them.

This book will discuss relational design in SQL terms, but will also point out places where the relational model should work differently.