As outlined above, rDBMS acts on the data with operations of relational algebra like projections, selections, joins, set operations (union, except and intersect) and more. The operations of relational algebra are denoted in a mathematical language which is highly formal and hard to understand for end users and - possibly also - for many software engineers. Therefore rDBMS offers a layer above relational algebra, which is easy to understand but nevertheless can be mapped to the underlying relational operations. Since the 1970s we have seen some languages doing this job, one of them was SQL - another example was QUEL. In the early 1980s (after a rename from its original name SEQUEL due to trademark problems) SQL achieved market dominance. And in 1986 SQL was standardized for the first time. The current version is SQL 2011.
The tokens and syntax of SQL are oriented on English common speech to keep the access barrier as small as possible. An SQL command like
UPDATE employee SET salary = 2000 WHERE id = 511; is not far away from the sentence "Change employee's salary to 2000 for the employee with id 511."
The next simplification is that all key words of SQL can be expressed in any combination of upper and lower case characters. It makes no difference whether
UPDATE, update, Update, UpDate or any other combination of upper and lower case characters is written. The keywords are case insensitive.
Next SQL is a descriptive language, not a procedural one. It does not pre-decide all aspects of the relational operations (which operation, their order, ...) which are generated from the given SQL statement. The rDBMS has the freedom to generate more than one execution plan from a statement. It compares the genereated execution plans with each other and runs this one, it thinks is best in the given situation. Additionally the end user is freed from all the gory details of data access, e.g.: Which one of a set of WHERE criteria should be evaluated first if they are combined with AND?
Despite those simplifications SQL is very powerful. Especially since it allows the manipulation of a set of data records with one single statement.
UPDATE employee SET salary = salary * 1.1 WHERE salary < 2000; will affect all employee records with an actual salary smaller than 2000. Potentially, there may be thousands of those records, only a few or even zero. It may also be noted that the operation is not a fix manipulation. The wording
SET salary = salary * 1.1 leads to an increase of the salaries by 10%, which may be 120 for one employee and 500 for another one.
The designer of SQL tried to define the language elements orthogonally to each other. Among other things this refers to the fact that any language element may be used in all positions of a statement where the result of that element may be used directly. E.g.: If you have a function power() which takes two numbers and returns another number, you can use this function in all positions where numbers are allowed. The following statements are syntactically correct (if you have defined the function power() ) - and lead to the same resulting rows.
SELECT salary FROM employee WHERE salary < 2048; SELECT salary FROM employee WHERE salary < POWER(2, 11); SELECT POWER(salary, 1) FROM employee WHERE salary < 2048;
Another example of orthogonality is the use of subqueries within UPDATE, INSERT, DELETE or inside another SELECT statement.
However, SQL is not free of redundancy. Often there are several possible formulations to express the same situation.
SELECT salary FROM employee WHERE salary < 2048; SELECT salary FROM employee WHERE NOT salary >= 2048; SELECT salary FROM employee WHERE salary BETWEEN 0 AND 2048; -- 'BETWEEN' includes edges
This is a very simple example. In complex statements there may be the choice between joins, subqueries and the exists predicate.
Core SQL consists of statements. Statements consist of key words, operators, values, names of system- and user-objects or functions. The statements are concluded by a semicolon. In the statement
SELECT salary FROM employee WHERE id < 100; the tokens SELECT, FROM and WHERE are key words. salary, employee and id are object names, the "<" sign is an operator and "100" is a value.
The SQL standard arranges statements into 9 groups:
- "The main classes of SQL-statements are:
- SQL-schema statements; these may have a persistent effect on the set of schemas.
- SQL-data statements; some of these, the SQL-data change statements, may have a persistent effect on SQL data.
- SQL-transaction statements; except for the <commit statement>, these, and the following classes, have no effects that persist when an SQL-session is terminated.
- SQL-control statements.
- SQL-connection statements.
- SQL-session statements.
- SQL-diagnostics statements.
- SQL-dynamic statements.
- SQL embedded exception declaration."
This detailed grouping is unusual in common speech. Usually it is distinguish between three groups:
- Data Definition Language (DDL): Managing the structure of database objects (create/drop tables, views, columns, ...)
- Data Manipulation Language (DML): Managing and retrieval of data with the statements INSERT, UPDATE, DELETE, SELECT, COMMIT and ROLLBACK.
- Data Control Language (DCL): Managing access rights.
Hint: In some publications the SELECT statement is said to build its own group Data Query Language. This group has no other statements than SELECT.
Core SQL as described above is not Turing complete. It misses conditional branches, variables, subroutines. But the standard as well as most implementations offers an extension to fulfill the demand for Turing completeness. In 'Part 4: Persistent Stored Modules (SQL/PSM)' of the standard there are definitions for IF-, CASE-, LOOP-, assignment- and other statements. The existing implementations of this part have different names, different syntax and also a different scope of operation: PL/SQL in Oracle, SQL/PL in DB2, Transact-SQL or T-SQL in SQL Server and Sybase, PL/pgSQL in Postgres and simply 'stored procedures' in MySQL.