Structured Query Language/SQL: A Language for Working with rDBMS

      History

      As outlined above rDBMS acts on the data with operations of the relational algebra like projections, selections, joins, set operations (union, difference, intersection, ...) and more. The operations of the relational algebra are notated in a mathematical language which is highly formal and hard to understand for end users and - possibly also - for many software engineers. Therefor rDBMS offers a layer above relational algebra, which is easy to understand but nevertheless can be mapped to the underlaying relational operations. Since the 1970s we have seen some languages doing this job, one of them was SQL - another example was QUEL. In the early 1980s (after a rename from its original name SEQUEL due to trademark problems) SQL achieved market dominance. And in 1986 SQL was standardized for the first time. The actual version is SQL 2011.

      ↑Jump back a section

      Characteristics

      The tokens and syntax of SQL are oriented on the English common speech to keep the access barrier as small as possible. A SQL command like UPDATE employee SET salary = 2000 WHERE id = 511; is not fare away from the sentence "Change employees salary to 2000 for the one with id 511."

      The next simplification is that all key words of SQL can be expressed in any combination of upper and lower case characters someone prefers. It makes no difference whether you write UPDATE, update, Update, UpDate or any other combination of upper and lower case characters. The keywords are case insensitive.

      Next SQL is a descriptive language, not a procedural one. It does not pre-decide all aspects of the relational operations (which operation, their order, ...) which are generated from the given SQL statement. The rDBMS has the freedom to generate more than one execution plan from it and run this one, from which it thinks it is the best in the given situation. Additionally the end user is freed from all the gory details of data access, e.g.: Which one of a set of WHERE criteria should be evaluated first if they are combined with AND?

      Despite of those simplifications SQL is very powerful. Especially it allows the manipulation of a set of data records with one single statement. UPDATE employee SET salary = salary * 1.1 WHERE salary < 2000; will affect all employee records with an actual salary smaller than 2000: there may be thousands of those records, only few or even zero. And you may have recognized that the operation is not a fix manipulation. The wording SET salary = salary * 1.1 leads to a raise of the salaries by 10%, which may be 120 for one employee and 150 for another one.

      The designer of SQL tried to define the language elements orthogonal to each others. Among other things that means that any language element may be used at all positions of a statement where the result of that element may be used directly. E.g.: If you have a function power() which takes two numbers and returns another number, you can use this function at all places where numbers are allowed. The following statements are syntactically correct (if you have defined the function power() ) - and they lead to the same data records, but this aspect is not important here.

      SELECT salary FROM employee WHERE salary < 2048;
      SELECT salary FROM employee WHERE salary < POWER(2, 11);
      SELECT POWER(salary, 1) FROM employee WHERE salary < 2048;
      

      Another example of orthogonality is the use of a subqueries within UPDATE, INSERT, DELETE or insight another SELECT statement.

      Nevertheless SQL is not free of redundancy. Often there are a lot of choices you can choose from to express the same situation.

      SELECT salary FROM employee WHERE salary < 2048;
      SELECT salary FROM employee WHERE NOT salary >= 2048;
      SELECT salary FROM employee WHERE salary BETWEEN 0 AND 2047.99;
      

      is a simple example. In complex statements you may have the choice between joins, subqueries and the exists construct.

      ↑Jump back a section

      Fundamentals

      Core SQL consists of statements. Statements consists of key words, operators, values, names of system and user objects or functions. They are concluded by a semicolon. In the statement SELECT salary FROM employee WHERE id < 100; the tokens SELECT, FROM and WHERE are key words. salary, employee and id are object names, the "<" sign is an operator and "100" is a value.

      The SQL standard arranges statements into 9 groups:

      "The main classes of SQL-statements are:
      SQL-schema statements; these may have a persistent effect on the set of schemas.
      SQL-data statements; some of these, the SQL-data change statements, may have a persistent effect on SQLdata.
      SQL-transaction statements; except for the <commit statement>, these, and the following classes, have no effects that persist when an SQL-session is terminated.
      SQL-control statements.
      SQL-connection statements.
      SQL-session statements.
      SQL-diagnostics statements.
      SQL-dynamic statements.
      SQL embedded exception declaration."

      This detailed grouping is unusual in common speech. Usually we distinguish between three groups:

      Data Definition Language (DDL): Managing the structure of database objects (create/drop tables, views, columns, ...)
      Data Manipulation Language (DML): Managing and retrieval of data with the statements INSERT, UPDATE, DELETE, SELECT, COMMIT and ROLLBACK.
      Data Control Language (DCL): Managing access rights.

      Hint: In some publications the SELECT statement is said to build its own group Data Query Language. This group has no other statements than SELECT.

      ↑Jump back a section

      Turing completeness

      Core SQL as described above is not Turing complete. It misses conditional branches, variables, subroutines. But the standard as well as most implementations offers an extension to fulfill the demand for Turing completeness. In 'Part 4: Persistent Stored Modules (SQL/PSM)' of the standard there are definitions for IF-, CASE-, LOOP-, Assignment- and other statements. In existing implementations this part has different names, different syntax and also a different scope of operation: PL/SQL in Oracle, SQL/PL in DB2, Transact-SQL or T-SQL in MS and Sybase, PL/pgSQL in Postgres and simply 'stored procedures' in MySQL.

      ↑Jump back a section
      Last modified on 30 May 2012, at 18:47