Applied Programming/Databases

Overview

edit

Databases

edit

What is a database?

edit

A database is a data structure that stores organized information. Most databases contain multiple tables, which may each include several different fields. For example, a company database may include tables for products, employees, and financial records. Each of these tables would have different fields that are relevant to the information stored in the table.[1]

A database is stored as a file or a set of files. The information in these files may be broken down into records, each of which consists of one or more fields. Fields are the basic units of data storage, and each field typically contains information pertaining to one aspect or attribute of the entity described by the database. Records are also organized into tables that include information about relationships between its various fields. Although database is applied loosely to any collection of information in computer files, a database in the strict sense provides cross-referencing capabilities. Using keywords and various sorting commands, users can rapidly search, rearrange, group, and select the fields in many records to retrieve or create reports on particular aggregates of data.[2]

Why do we use them?[3]

edit

The three main advantages that databases have over other, simpler data storage systems (such as text files and spreadsheets) are access, integrity, and security.

Access

edit

Access is about making data available to users.

Databases support good data access because:

  • Large volumes of data can be stored in one place
  • Multiple users can read and modify the data at the same time
  • Databases are searchable and sortable, so the data you need can be found quick and easily
  • The data structure is extendable and can be modified as requirements change

Integrity

edit

Databases can ensure that the data contained within them is correct, or has integrity.

To ensure the integrity of a database, each change or transaction must conform to a set of rules known as ACID:

  • Atomicity: when changing data within a database, if any part of the change fails, the whole change will fail and the data will remain as it was before the change was made; this prevents partial records being created
  • Consistency: before data can be changed in a database, it must be validated against a set of rules
  • Isolation: databases allow multiple changes at the same time, but each change is isolated from others
  • Durability: once a change has been made, the data is safe, even in the event of system failure

In addition, databases will have mechanisms for backup, distribution, and redundancy, to ensure data is not lost.

Security

edit

While access to text files or spreadsheets can be secured, once someone has access to a file, they have access to all data within that file. Databases can be made very secure, and that includes the ability to have access rights to specific parts of the database and not others.

Databases allow access to be controlled, allowing users to have different privileges: for example, some users may be able to read data, but not to write it.

Data can also be segmented so that users can access only certain parts: for example, a user may be able to read an employee’s name and address but not their salary information.

What other uses for a database can you think of? Share your thoughts in the comments.

In the next step, you will explore an example relational database and look at the data contained within it.

Relational Model

edit

What is it?

edit

[4]

The relational model is an abstract model used to organize data within a database. In order to control access to a database, write data, run queries, or perform any other tasks related to database management, a database management system must have some kind of underlying model that defines how the data within it are organized. Databases that implement the relational model are often referred to as relational databases. The relational model was for a long time the most sophisticated model for organizing data, and its widespread use has only recently been curbed by the rise of nonrelational — or, NoSQL — data models.

The most fundamental elements in the relational model are relations, which users and modern RDBMSs recognize as tables. A relation is a set of tuples, or rows in a table, with each tuple sharing a set of attributes, or columns. A column is the smallest organizational structure of a relational database, and represents the various facets that define the records in the table. Hence their more formal name, attributes. It can be helpful to think of each tuple as a unique instance of whatever type of people, objects, events, or associations the table holds.

Constraints

edit

[5]

Domain constraints

Domain constraints can be violated if an attribute value is not appearing in the corresponding domain or it is not of the appropriate data type. Domain constraints specify that within each tuple, and the value of each attribute must be unique. This is specified as data types which include standard data types integers, real numbers, characters, Booleans, variable length strings, etc.

Key constraints

An attribute that can uniquely identify a tuple in a relation is called the key of the table. The value of the attribute for different tuples in the relation has to be unique.

Referential integrity constraints

Referential Integrity constraints in DBMS are based on the concept of Foreign Keys. A foreign key is an important attribute of a relation which should be referred to in other relationships. Referential integrity constraint state happens where relation refers to a key attribute of a different or same relation. However, that key element must exist in the table.

Example in Database

edit

[6]

An idealized, very simple example of a description of some relvars (relation variables) and their attributes:

  • Customer (Customer ID, Tax ID, Name, Address, City, State, Zip, Phone, Email, Sex)
  • Order (Order No, Customer ID, Invoice No, Date Placed, Date Promised, Terms, Status)
  • Order Line (Order No, Order Line No, Product Code, Qty)
  • Invoice (Invoice No, Customer ID, Order No, Date, Status)
  • Invoice Line (Invoice No, Invoice Line No, Product Code, Qty Shipped)
  • Product (Product Code, Product Description)

In this design we have six relvars: Customer, Order, Order Line, Invoice, Invoice Line and Product. The bold, underlined attributes are candidate keys. The non-bold, underlined attributes are foreign keys.

Usually one candidate key is chosen to be called the primary key and used in preference over the other candidate keys, which are then called alternate keys.

A candidate key is a unique identifier enforcing that no tuple will be duplicated; this would make the relation into something else, namely a bag, by violating the basic definition of a set. Both foreign keys and superkeys (that includes candidate keys) can be composite, that is, can be composed of several attributes. Below is a tabular depiction of a relation of our example Customer relvar; a relation can be thought of as a value that can be attributed to a relvar.

SQLite

edit

Activities

edit

Key Terms

edit

Attribute - A particular feature which describes and is characteristic of a record.

Commit - Ends a transaction within a relational database management system (RDBMS) and makes all changes visible to other users.[7]

Connection - read-only attribute provides the SQLite database Connection used by the Cursor object. A Cursor object created by calling con.cursor() will have a connection attribute that refers to con.

Cursor - A database cursor is an object that enables traversal over the rows of a result set. It allows you to process individual row returned by a query.

Database - And organized collection of data.[7]

Database Management System - There are multiple types of DBMS, (Relational, Object Oriented, etc), but each variant allows for reading, inserting, updating, deleting of a database along with general administrative/security tools. [7]

Delete - Removes one or more records from a table.[7]

Heading - A set of attributes.[7]

Insert - Adds one or more records to a single table.[7]

Primary key - A specific choice of one or more columns (attributes) whose data uniquely identifies each row (tuple) in a table (relation).[7]

Query - Retrieves/creates data based on specific criteria. Queries must follow SQL rules and syntax.[7]

Retrieval - Providing information in a form directly usable or for further processing by other applications.[7]

Record - An instance of an entity in a database table.[7]

Schema - A set of integrity constraints that structures and establishes the rules for data in a database.[7]

Select - A common DQL command used to return a result set of records, either from a single or from multiple tables.[7]

SQL - Structured Query Language, a programmatic standard for querying data from a relational database.[7]

SQLite - A relational database management system (abbreviated RDBMS) embedded directly into a program.[7]

Update - Changes specified data of one or more records in a table.[7]

View - Set of rows from a database of a stored query on the data, which the database users can query just as they would in a persistent database collection object.[7]


References

edit