Structured Query Language/Database Management Systems (DBMS)
Historical Context
One of the original scope of computer applications was storing large amount of data on mass storage devices and retrieve them at a later point in time. Over time user requirements increased: not only sequential access but also random access to data records, concurrent access by parallel (writing) processes, recovery after hardware and software failures, high performance, scalability, ... . In the 1970s and 1980s science and computer industry developed techniques to fulfill those requests.
Define Query languages?
What makes up a Database Management System?
Basic bricks for efficient data storages - and for this reason for all DBMS - are implementations of fast read and write access algorithms to data located in central memory and mass storage devices like routines for B-trees, Index Sequential Access Method (ISAM), other indexing techniques as well as buffering of dirty and non-dirty blocks. Those algorithms are not unique to DBMS. They are also applied to file systems, some programming languages, operating systems, application server and many more.
Additionally to the appropriation of those routines DBMS guarantees compliance to the ACID paradigm. This means, that also in a multi-user environment all changes to data within one transaction are:
- Atomic: Either all changes take place or none.
- Consistent: The changes transform the database from one valid state to another valid state.
- Isolated: Transactions of different users working at the same time will not affect each other.
- Durable: Commited changes will keep into the database even if the system crashes afterwards.
Classification of DBMS Design
You can distinguish between the following generations of DBMS design and implementation:
- Hierarchical DBMS: Data structures are designed in a hierarchical parent/child model where every child has exactly one parent (with the exception of the root structure, which has no parent). The result is that the data is modeled and stored as a tree. Child rows are physical stored directly behind the owning parent row. So there is no need to store the parents ID or something alike within the child row. (XML realize a similar approach.) If an application processes data in exactly this hierarchical way, it is very fast and efficient. On the other hand other access ways are inefficient. Furthermore hierarchical DBMS do not provide the modeling of n:m relations. Another fault is that you have no possibility to navigate directly to data stored in lower levels. You must first navigate over the given hierarchy before reaching that data.
- The best known hierarchical DBMS is IMS from IBM.
- Network DBMS: The network model designs data structures as a complex network with links from one or more parent nodes to one or more child nodes. Even cycles are possible. There is no necessity for a single root node. In general the terms parent node and child node lose their hierarchical meaning and may be referred as link source and link destination. As those links are realized as physical links within the database, applications which follow the links show good performance.
- Relational DBMS: The relational model designs data structures as relations (tables) with attributes (columns) and the relationship between those relations. Definitions in this model are expressed in a pure declarative way not predetermining any implementation issue like links from one relation to another one or a certain sequence of rows in the database. Relationships purely base upon content. At runtime all linking and joining is done by evaluating the actual data values, e.g.:
... WHERE employee.department_id = department.id .... The consequence is that - with the exception of explicit foreign keys - there is no meaning of a parent/child or owner/member denotation. Relationships in this model do not have any direction.
- The relational model and SQL are based on the mathematical theorie of relational algebra.
- During the 1980s and 1990s proprietary and open source DBMS based on the relational design paradigm has established themselves as market leaders.
- Object oriented DBMS: Nowadays most applications are written in an object oriented programming language (OOP). If in such cases the underlaying DBMS belongs to the class of relational DBMS, the so called object-relational impedance mismatch arises, that is to say in contrast to the application language pure relational DBMS (prDBMS) do not support central concepts of OOP:
-
- Type system: OOP offers the facility to define own classes with complex internal structures. They are build on primitive types, system classes, references to other or the same class. prDBMS knows only predefined types. Additionally prDBMS insists in first normal form, what means that attributes must be scalar. In OOP they may be sets, lists or arrays of the desired type.
- Inheritance: Classes of OOP may inherit attributes and methodes from their superclass. This concept is not known by prDBMS.
- Polymorphism: The runtime system can decide via late binding which one of a group of methodes with same name and parameter types will be called. This concept is not known by prDBMS.
- Encapsulation: Data and access methods to data are stored within the same class. It is not possible to access the data directly - the only way is using the access methods of the class. This concept is not known by prDBMS.
- Object oriented DBMS are designed to overcome the gap between prDBMS and OOP. At their best time they reached a weak market position in the mid and late 1990s. Afterwards some of their concepts are incorporated into the SQL standard as well as rDBMS implementations.
- NoSQL: The term NoSQL stands for the emerging group of DBMS which differs from others in central concepts:
- They do not necessarily support all aspects of the ACID paradigm.
- The data must not necessarily be structured according to any schema.
- Their goal is support for fault-tolerant, distributed data with very huge volume.
- Implementations differs widely in storing techniques: you can see key-value stores, document oriented databases, graph oriented databases and more.
-
- They do not offer a SQL interface. In 2011 an initiative started to define an alternative language: Unstructured Query Language.