Structured Query Language/Database Management Systems (DBMS)


Historical Context edit

One of the original scopes of computer applications was storing large amounts of data on mass storage devices and retrieving them at a later point in time. Over time user requirements increased to include not only sequential access but also random access to data records, concurrent access by parallel (writing) processes, recovery after hardware and software failures, high performance, scalability, etc. In the 1970s and 1980s, the science and computer industries developed techniques to fulfill those requests.

What makes up a Database Management System? edit

Basic bricks for efficient data storage - and for this reason for all Database Management Systems (DBMS) - are implementations of fast read and write access algorithms to data located in central memory and mass storage devices like routines for B-trees, Index Sequential Access Method (ISAM), other indexing techniques as well as buffering of dirty and non-dirty blocks. These algorithms are not unique to DBMS. They also apply to file systems, some programming languages, operating systems, application servers, and much more.

In addition to the appropriation of these routines, a DBMS guarantees compliance with the ACID paradigm. This compliance means that in a multi-user environment all changes to data within one transaction are:

Atomic: all changes take place or none.
Consistent: changes transform the database from one valid state to another valid state.
Isolated: transactions of different users working at the same time will not affect each other.
Durable: the database retains committed changes even if the system crashes afterward.

Classification of DBMS Design edit

A distinction between the following generations of DBMS design and implementation can be made:

  • Hierarchical DBMS: Data structures are designed in a hierarchical parent/child model where every child has exactly one parent (except the root structure, which has no parent). The result is that the data is modeled and stored as a tree. Child rows are physically stored directly after the owning parent row. So there is no need to store the parent's ID or something like it within the child row (XML realizes a similar approach). If an application processes data in exactly this hierarchical way, it is speedy and efficient. But if it's necessary to process data in a sequence that deviates from this order, access is less efficient. Furthermore, hierarchical DBMSs do not provide the modeling of n:m relations. Another fault is that there is no possibility to navigate directly to data stored in lower levels. You must first navigate over the given hierarchy before reaching that data.
The best-known hierarchical DBMS is IMS from IBM.
  • Network DBMS: The network model designs data structures as a complex network with links from one or more parent nodes to one or more child nodes. Even cycles are possible. There is no need for a single root node. In general, the terms parent node and child node lose their hierarchical meaning and may be referred to as link source and link destination. Since those links are realized as physical links within the database, applications that follow the links show good performance.
  • Relational DBMS: The relational model designs data structures as relations (tables) with attributes (columns) and the relationship between those relations. Definitions in this model are expressed in a purely declarative way, not predetermining any implementation issues like links from one relation to another or a certain sequence of rows in the database. Relationships are based purely upon content. At runtime, all linking and joining is done by evaluating the actual data values, e.g.: ... WHERE employee.department_id = department.id .... The consequence is that - except for explicit foreign keys - there is no meaning of a parent/child or owner/member denotation. Relationships in this model do not have any direction.
The relational model and SQL are based on the mathematical theory of relational algebra.
During the 1980s and 1990s, proprietary and open-source DBMS's based on the relational design paradigm established themselves as market leaders.
  • Object-oriented DBMS: Nowadays, most applications are written in an object-oriented programming language (OOP). If, in such cases, the underlying DBMS belongs to the class of relational DBMS, the so-called object-relational impedance mismatch arises. That is to say, in contrast to the application language, pure relational DBMS (prDBMS) does not support central concepts of OOP:
Type system: OOPs do not only know primitive data types. As a central concept of their language, they offer the facility to define classes with complex internal structures. The classes are built on primitive types, system classes, references to other or the same class. prDBMS knows only predefined types. Secondary prDBMS insists in first normal form, which means that attributes must be scalar. In OOPs they may be sets, lists or arrays of the desired type.
Inheritance: Classes of OOPs may inherit attributes and methods from their superclass. This concept is not known to prDBMS.
Polymorphism: The runtime system can decide via late binding which one of a group of methods with the same name and parameter types will be called. This concept is not known by prDBMS.
Encapsulation: Data and access methods to data are stored within the same class. It is not possible to access the data directly - the only way is using the access methods of the class. This concept is not known to prDBMS.
Object-oriented DBMS are designed to overcome the gap between prDBMS and OOP. At their peak, they reached a weak market position in the mid and late 1990s. Afterward, some of their concepts were incorporated into the SQL standard as well as rDBMS implementations.
  • NewSQL: This class of DBMS seeks to provide the same scalable performance as NoSQL systems while maintaining the ACID paradigm, the relational model, and the SQL interface. They try to reach scalability by eschewing heavyweight recovery or concurrency control.