Unit 1.3.2 Databases

Databases are in use by almost all organisations today. They allow for data to be quickly retrieved, updated and filtered. They allow certain users access to data to be restricted (i.e. users can only see what they are permitted to see), which is good for security as well as focusing on work. They also reduce the likelihood of inconsistencies or errors.

Types of Files

Historically, data was stored in separate files. They were typically stored in serial or sequential files as data was stored on magnetic tapes, which had to be written to or read from in sequence.

Serial Files

Records in a serial file are stored one after another. This was the only way to store data on a long thin medium such as tape. The file could contain as much or as little data as necessary, but each record has to have the same structure. To locate a record, the whole file had to be searched from start to finish. For a large file this could be a lengthy process.

Sequential Files

Records in a sequential file are stored one after another, but in a certain order. This makes searching for a specific record much easier. It can however create problems, such as in a days library transactions. There is no particular order to follow (values such as book number, borrower number or date borrowed) other than chronological, which would be unhelpful.

This means that at intervals, the data must be sorted. This involves writing the data to another file and is only a partial solution. Searching through the file is still time consuming and cannot be done until the sorting is complete.

Indexing

Indexing allows sequential files to be searched more quickly by producing a separate index file. The data is divided into categories, which are linked to a position in the file where the category starts.

Even with all these techniques, there are still inefficiencies due to searching and sorting. Simple databases in these formats are flat-file databases.

Fixed and Variable Length Fields

Fixed Length

A fixed length field is one where each field is always the same number of bytes long. This allows the software to count bytes in order to count fields. This is an easy method to program and is quick to search and easy to calculate the file size for a planned database, but storage space can be wasted as not all values will use the allocated space for the field.

Variable Length

Each field is delimited by a unique identifier, such as a comma. This allows for a much smaller and more flexible file.

Hashing

A key field of a record can be transformed to create a disk address. This allows a device to go straight to a certain part of a disk and start there. This method works very well with sparse databases.

Flat File Databases

Pros

Quick and easy to create
Fine for small amounts of data
Great for single-entity models

Cons

Data redundancy - repeated data is present
Data inconsistency - more likely to be errors in repeated data
Reduced data integrity
Data dependence - changing the formats of data could require changes to whole programs.
Queries and reports are more challenging in a flat-file database.

Relational Databases

In a relational database, data is stored in separate tables. Each table stores data about a specific entity. There are certain rules a relational database should abide by:

Every row must be constructed in the same way i.e. each column must only contain one data type.
One column, or a combination of columns, must make each row unique. This unique identifier is called a primary key, if several columns are used it is called a composite primary key.
There are no specific rules about row or column order.
No two tuples (rows) in a relation (table) can be identical.

Tables in a relational database are linked by relationships. Relationships between records are produced by setting the value of a foreign key field to that of the primary key of the desired record in another table.

Secondary Keys

A secondary key allows a relation to be searched quickly, by linking similar items with a non-unique key. For example, a postcode is not unique to a single house, but it can be used to quickly find a house in a neighbourhood.

Entity Relationship Modelling

Data is stored in separate tables to avoid data redundancy. It is not however always obvious how to separate out entities in a database, so the process of database normalisation is used.

Database Normalisation

Normalisation is the process of designing a relational database to be the most effective, making it more efficient and useful. It centres around reducing redundant data and ensuring data dependencies. It is a collection of steps which gradually leads to a normal form.

Benefits of normalisation:

No data redundancy as data is stored in one place, saving storage space
Data integrity is maintained as data is not repeated it cannot become inconsistent
Referential integrity enforced stopping related records becoming orphans (by preventing the one side of a one-to-many relationship from being deleted)
Faster searching as tables are smaller with fewer fields
More complex queries can be used as data can be linked across related tables

Unnormalised Form (UNF / ONF)

There is data redundancy and may have complex data structures within a single attribute.

First Normal Form (1NF)

Each field only contains one piece of data for that attribute. For example, a subjects field in a students table only includes one subject.

To convert from UNF to 1NF:

Eliminate duplicate attributes from the same table
Eliminate entries containing complex data structures in a single attribute by using multiple entries
Identify (a) column(s) which will uniquely identify each entry

Second Normal Form (2NF)

1NF and has no partial dependencies (no attributes are dependent on part of a composite primary key).

To convert from 1NF to 2NF:

Remove any datasets occurring in multiple rows and transfer them to new tables
Create relationships between these new tables and earlier tables with foreign keys

Third Normal Form (3NF)

2NF and contains no non-key dependencies (where an attribute is dependent on the value of another attribute which isn't the key).

All attributes are dependent on the key, the whole key, and nothing but the key.

To convert from 2NF to 3NF:

Remove any columns which are not dependent on the primary key and transfer them to new tables

Entity Relationship Diagrams

These diagrams demonstrate the relationships between entities:

Examples of different types of database relationship.

Database Management System (DBMS)

The DBMS is a type of software which is used to create and manage databases .They provide users with a systematic way to retrieve, order, update and manage data. It acts as an intermediary between applications accessing the database and the data stored in the database. The DBMS deals with:

database structure
individual tables
queries
interfaces
views
outputs

The DBMS also has protective and maintenance roles, including:

the setup and maintenance of access rights
automating backups
preserving referential integrity^[1]
maintaining indexes
updating the database

To maintain referential integrity, the DBMS ensures foreign keys correspond to the primary key of a record in the linked table. This can be enforced by preventing records from being deleted if they are referenced by records in other tables or by cascading and deleting records referencing the one being deleted.

Database Views

Physical View

A view of the actual data storage. This is only the concern of the DBMS system engineers, as it shows how data items are stored on the disk.

Logical View

This looks at the construction of tables, queries, reports and software which will deliver the functionality of the database to users of the system.

User View

Outward appearance of the database, and the user interface for the end user.

Structured Query Language (SQL)

This is a language designed to manage data in the database. It can be used to access, insert and change data and has the ability to create, modify and delete tables and relationships in a database.

Basic Statements

Retrieving values of specific fields from a table

SELECT name, city FROM customers;

Returns the value of name and city columns for each record in the customers table.

Retrieving a set of records using a condition

SELECT * FROM customers WHERE country = "Mexico";

Returns all columns for records in the customers table where the country field is Mexico.

Retrieving a set of records using multiple conditions

SELECT * FROM customers WHERE country = "Germany" and city = "Berlin";

Returns all columns for records in the customers table where the country field is Germany and city is Berlin.

SELECT * FROM customers WHERE city = "Berlin" or city = "Munich";

Returns all columns for records in the customers table where the city field is either Berlin or Munich.

Deleting records from a table

DELETE FROM customers WHERE name = "Jordan";

Deletes all records from the customers table where the name field is Jordan.

Inserting records into a table

INSERT INTO customers (name, country, city) VALUES ("Matt","England","London");

Inserts a new record into the customers table with name of Matt, country England and city London.

Deleting a whole table

DROP TABLE customers;

Deletes the customers table and all its records.

Retrieve linked records from multiple tables

SELECT name, cost FROM customers JOIN orders ON customers.id = order.customer_id;

Returns the customer name and order cost for all records in the orders table.

ACID

ACID is a set of rules which protect the integrity of a database, they describe the properties a transaction must conform to. A transaction is a single logical operation performed on a database.

Atomicity - A transaction must be performed or not performed, half finished transactions cannot be saved. If the transaction fails, the database is not altered.
Consistency - Any transaction must take the database from one valid state to another. For example, when transferring money between accounts, money moved into one account must be balanced with money taken out of another.
Isolation - Transactions must not interfere with other transactions. Transactions can be queued (transaction processing) when a record is locked so that transactions occur sequentially.
Durability - Once a transaction is complete it is permanently stored. In the event of a system failure the transaction and its actions will remain. Usually this involves storing the database in a form of non-volatile memory^[2] like secondary storage.

CRUD

These are the four basic functions of persistent storage:

Create - New entities can be created and stored in a table.
Read - Information about the table's entities can be retrieved and outputted.
Update - Information about entities can be modified
Delete - Entities can be deleted from a table.

Notes

References

↑ Referential integrity: the state of the database being consistent
↑ Contents of non-volatile memory are retained when power is lost

[1] Referential integrity: the state of the database being consistent

[2] Contents of non-volatile memory are retained when power is lost

[1]

[2]