105.3 SQL Data Management

Candidates should be able to query databases and manipulate data using basic SQL commands. This objective includes performing queries involving joining of 2 tables and/or subselects.

Key Knowledge Areas

Use of basic SQL commands.
Perform basic data manipulation.

The Structured Query Language (SQL) is a database management programming language. SQL is a tool for accessing databases, and more specifically, relational databases, and can be used with different database products. This chapter will prepare you to learn basic database management using this language.

Common SQL Implementations for Linux

Some of the more common choices in Linux include the following:

MySQL - This SQL Implementation is owned by Suns, and is released under the GPL. Most major Linux distributions will include MySQL in their package databases.

PostgreSQL - Released under the BSD license, PostgreSQL was evolved from Ingres software. (PostgreSQL= post-Ingres SQL). It’s available as multiple packages in most Linux distributions.

SQLite – To implement SQL as a library, you need SQLite. SQLite is intended to provide users and programs a way to store data using a SQL interface within the program. SQLite3 can be used to manipulate SQLite databases for major Linux distros.

SQL Basics

SQL is used to access relational databases. Each database contains more or less tables which in turn contain more or less rows and columns. Hereby a single row is seen as a separate object with features represented by the tables' columns. To access a table's data you first have to connect to its database.

1. mysql -u USERNAME -p PASSWORD
2. use DATABASE

Basic SQL Commands

SELECT

A SELECT statement retrieves zero or more rows from one or more database tables or database views. In most applications, SELECT is the most commonly used Data Manipulation Language (DML) command. As SQL is a declarative programming language, SELECT queries specify a result set, but do not specify how to calculate it. The database translates the query into a "query plan" which may vary between executions, database versions and database software. This functionality is called the "query optimizer" as it is responsible for finding the best possible execution plan for the query, within applicable constraints.

The SELECT statement has many optional clauses:

WHERE specifies which rows to retrieve.
GROUP BY groups rows sharing a property so that an aggregate function can be applied to each group.
HAVING selects among the groups defined by the GROUP BY clause.
ORDER BY specifies an order in which to return the rows.

Examples:

Given a table T, the query SELECT * FROM T will result in all the elements of all the rows of the table being shown.

With the same table, the query SELECT C1 FROM T will result in the elements from the column C1 of all the rows of the table being shown.

With the same table, the query SELECT * FROM T WHERE C1 = 1 will result in all the elements of all the rows where the value of column C1 is '1' being shown.

WHERE

A WHERE clause specifies that a SQL statement should only affect rows that meet specified criteria. The criteria are expressed in the form of predicates. WHERE clauses are not mandatory clauses of SQL statements, but should be used to limit the number of rows affected by a SQL DML statement or returned by a query.

The following query returns only those rows from table mytable where the value in column mycol is greater than 100.

SELECT *

FROM mytable

WHERE mycol > 100

DISTINCT

DISTINCT will eliminate all duplicate rows from the selection. DISTINCT ON column will eliminate all duplicates in the specified column; this is equivalent to using GROUP BY column. ALL will return all candidate rows, including duplicates.

SELECT DISTINCT jobTitle FROM employees;

+----------------------+

| jobTitle |

+----------------------+

| President |

| VP Sales |

| VP Marketing |

| Sales Manager |

(APAC) |

+----------------------+

GROUP BY

The GROUP BY clause is used to project rows having common values into a smaller set of rows. GROUP BY is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. The WHERE clause is applied before the GROUP BY clause.

The example below demonstrates a query of multiple tables, grouping, and aggregation, by returning a list of books and the number of authors associated with each book.

SELECT Book.title,

count(*) AS Authors

FROM Book JOIN Book_author

ON Book.isbn = Book_author.isbn

GROUP BY Book.title;

Example output might resemble the following:

Title Authors

---------------------- --------------- ---------

SQL Examples and Guide 4

The Joy of SQL 1

An Introduction to SQL 2

Pitfalls of SQL 1

HAVING

A HAVING clause in SQL specifies that an SQL SELECT statement should only return rows where aggregate values meet the specified conditions. It was added to the SQL language because the WHERE keyword could not be used with aggregate functions.

To return a list of department IDs whose total sales exceeded $1000 on the date of January 1, 2000, along with the sum of their sales on that date:

SELECT DeptID, SUM(SaleAmount)

FROM Sales

WHERE SaleDate = '01-Jan-2000'

GROUP BY DeptID

HAVING SUM(SaleAmount) > 1000

ORDER BY

The ORDER BY clause allows a user to specify that he/she wishes the rows sorted according to the ASCending or DESCending mode operator.

SELECT * FROM Employees

ORDER BY LastName, FirstName

IN

IN will find any values existing in a set of candidates.

SELECT ename WHERE ename IN ('value1', 'value2', ...)

All rows match the predicate if their value is one of the candidate set of values. This is the same behavior as

SELECT ename WHERE ename='value1' OR ename='value2'

except that the latter could allow comparison of several columns, which each IN clause does not. For a larger number of candidates, IN is less verbose.

BETWEEN

BETWEEN will find any values within a range.

SELECT ename WHERE ename BETWEEN 'value1' AND 'value2'

All rows match the predicate if their value is between 'value1' and 'value2', inclusive.

LIKE

LIKE will find a string fitting a certain description.

Ending Wildcard
- Find any string that begins with the letter 'S'

SELECT ename FROM emp WHERE ename LIKE 'S%';

Leading Wildcard
- Find any string that ends with the letter 'S'

SELECT ename FROM emp WHERE ename LIKE '%S';

Multiple Wildcards
- Find any string that contains, anywhere, the letter 'S'

SELECT ename FROM emp WHERE ename LIKE '%S%';

Single Character Wildcard
- Find any string that contains the letter 'A' followed by any single character followed by the letter 'E'

SELECT ename FROM emp WHERE ename LIKE '%A_E%';

SQL programmers need to be aware that the LIKE predicate typically performs a search without the normal performance benefit of indexes. Using '=', '<>', etc.. instead will increase performance. Users of the LIKE predicate should be aware that case sensitivity (e.g., 'S' versus 's') may be different based upon database product or configuration.

UNION

In SQL the UNION clause combines the results of two SQL queries into a single table of all matching rows. The two queries must result in the same number of columns and compatible data types in order to unite. Any duplicate records are automatically removed unless UNION ALL is used.

Note that UNION does not guarantee the order of rows. Rows from the second operand may appear before, after, or mixed with rows from the first operand. In situations where a specific order is desired, ORDER BY must be used.

Note that UNION ALL may be much faster than plain UNION.

sales2005
person	amount
Joe	1000
Alex	2000
Bob	5000

sales2006
person	amount
Joe	2000
Alex	2000
Zach	35000

Executing this statement:

SELECT * FROM sales2005

UNION

SELECT * FROM sales2006;

yields this result set, though the order of the rows can vary because no ORDER BY clause was supplied:

person	amount
Joe	1000
Alex	2000
Bob	5000
Joe	2000
Zach	35000

Note that there are two rows for Joe because these rows are distinct across their columns. There is only one row for Alex because these rows are not distinct for both columns.

JOIN

A SQL join clause combines records from two or more tables in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each. ANSI standard SQL specifies four types of JOINs: INNER, OUTER, LEFT, and RIGHT. In special cases, a table (base table, view, or joined table) can JOIN to itself in a self-join.

A programmer writes a JOIN predicate to identify the records for joining. If the evaluated predicate is true, the combined record is then produced in the expected format, a record set or a temporary table.

SELECT column_list FROM table_1 [INNER | LEFT | RIGHT] table_2 ON conditions_2 [INNER | LEFT | RIGHT] table_3 ON conditions_3 ... WHERE conditions

LEFT JOIN can be used when you want to retrieve the data from the main table (table_11) even if there is no match in other tables (table_2, table_3....). While RIGHT JOIN is used to retrieve the data from all other tables (table_2, table_3...) even if there is no match in the main table.

INNER JOIN

An inner join is the most common join operation used in applications and can be regarded as the default join-type. Inner join creates a new result table by combining column values of two tables (A and B) based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows which satisfy the join-predicate. When the join-predicate is satisfied, column values for each matched pair of rows of A and B are combined into a result row. The result of the join can be defined as the outcome of first taking the Cartesian product (or cross-join) of all records in the tables (combining every record in table A with every record in table B)—then return all records which satisfy the join predicate. Current SQL implementations normally use other approaches like a hash join or a sort-merge join where possible, since computing the Cartesian product is very inefficient.

SQL specifies two different syntactical ways to express joins: "explicit join notation" and "implicit join notation".

The "explicit join notation" uses the JOIN keyword to specify the table to join, and the ON keyword to specify the predicates for the join, as in the following example:

SELECT *

FROM employee INNER JOIN department

ON employee.DepartmentID = department.DepartmentID;

The "implicit join notation" simply lists the tables for joining (in the FROM clause of the SELECT statement), using commas to separate them. Thus, it specifies a cross-join, and the WHERE clause may apply additional filter-predicates (which function comparably to the join-predicates in the explicit notation).

INSERT

INSERT INTO table [ ( column [, ...] ) ]

{ VALUES ( expression [, ...] ) | SELECT query }

INSERT allows one to insert new rows into a table. One can insert a single row at a time or several rows as a result of a query. The columns in the target list may be listed in any order. In every column not present in the target list, its default value will be inserted. If a column has no declared default value it will be assumed as NULL. If the expression for each column is not of the correct data type, automatic type coercion will be attempted.

INSERT INTO tbl_movies(title,year) VALUES(‘Alexander’,’2004’)

UPDATE

UPDATE table SET column = expression [, ...]

[ FROM fromlist ]

[ WHERE condition ]

UPDATE changes the values of the columns specified for all rows which satisfy condition. Only the columns to be modified need appear as column. Array references use the same syntax found in SELECT. That is, either single array elements, a range of array elements or the entire array may be replaced with a single query. You must have write access to the table in order to modify it, as well as read access to any table whose values are mentioned in the WHERE condition.

UPDATE tbl_movies SET genre = 'Dramatic' WHERE genre = 'Drama';

The following is a partial list of the files, terms and utilities that were used.

insert
update
select
delete
from
where
group by
order by
join

Previous Chapter | Next Chapter

Ict-innovation/LPI/105.3

105.3 SQL Data Management

Common SQL Implementations for Linux

Basic SQL Commands