DBMS

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/DBMS

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.

Introduction

Data: raw facts and figures
Data Processing: performing operations on the input data to generate output.
Database: collection of related information about a particular enterprise.
Database Management System: collection of interrelated data, set of programs to access the data conveniently and efficiently.

File System

Typically data is stored in files.

1.) flat files contains one table at a time generally

2.) flat files contains values at each row and separated with a special symbol so to reach the data you have to parse each row and obtaining an array of values and then you can query the data.

3.) to control the data in a file , you have to read it line by line and parse it and because of it they are slow

4.) there is no control mechanism in files

5.) dbms systems commonly has a special language called sql to modify and reach the data easily

6.) dbms systems has indexes to reach the data, not read it line by line

7.) there are much more control mechanisms that approves the correctness of data

8.) you can reach the data across networks by using a dbms system easily and safely

Need for Database Management Systems

Data access through file systems suffers from the following problems.

minimal data redundancy
difficulty in accessing data
inconsistency
concurrent access
data isolation
integrity
atomicity
security

A good database management system solves all the above mentioned problems which a user generally faces in a file system.

Database Users

Following are the types of database users:

Database Administrator(DBA) - a database administrator performs the following activity
- Granting user authority to access the database
- Enforcing security and integrity rules
- Strategizing backup & recovery
- enforcing security and integrity rules
Application programmers
Data analysts
Naive users

Enterprise Application Architecture

three tier architecture: Modern enterprise application architecture consists of the following layers

application client
application server
database system

Present trend in the enterprise architecture is moving towards n-tier architecture in which each of the above three layers is further distributed among multiple systems or layers

Database Tools

query tools
administration tools

Database Architecture

Following are the different database architecture types:

Centralized
Client-server
Parallel (multi-processor)
Distributed

Database abstraction

physical abstraction
logical abstraction
view abstraction

To the design of complex data structure for the representation of data in the database. However since database system are often used by non computer professionals, this complexcity must be hidden from database system users. This is done by defining level of abstract as which the database may be viewed, There are logical view or external, conceptual view and internal view or physical view.

External view

This is the highest level of abstraction as seen by a user. This level of abstraction describe only part of entire database.

Conceptual view :

This is the next lower level of abstraction which is the sum total of user's views. This level describe what data are actually stored in the database. This level contain information about entire database in term of small number of relatively simple structure.

Internal Level :

This is the lowest level of abstraction at which one describe how the data are physically sorted.

Database Schema

Definition : Overall logical design of data base. Schema contains 'No of records + Type of data + No of attributes'

External level or Sub schema
logical schema
physical schema

Database Instance

Definition: The term instance is typically used to describe a complete database environment, including the RDBMS software, table structure, stored procedures and other functionality. It is most commonly used when administrators describe multiple instances of the same database.

Definition: The information stored in database at the particular movement is called instance.

Also Known As: environment is the place where the data can be stored in manageable manner. Examples: An organization with an employees database might have three different instances: production (used to contain live data), pre-production (used to test new functionality prior to release into production) and development (used by database developers to create new functionality).

Data Model

What is data model:it consists of some concepts to describe the structure of database i.e., data type, relations, and constraints that should hold on the data. E.g. ER model

XML data Model - semi structured
relational data model
object oriented
object relational
network
hierarchy

Database Languages

procedural - what data is required and how to get this data
declarative - what data is required without describing how to get this data
data control language (DCL)
data manipulation language (DML)
data definition language (DDL)
data storage language

Relational Database

A database based on relational algebra or relation model is called relation database

Relational data model

relation: a table in a relational database is called relation in the mathematical language of relational algebra. relations are unordered.
attribute: column of a table in database table is called attributes. columns or attributes have names.
domain: set of permissible values for an attribute ( or column) is called domain.
tuple: a row in the database table is called tuple in the mathematical language of relational algebra. order of tuples in a relation has no significance.
database: a database is a collection of multiple relations.
schema: a database design is called schema, alternatively, a schema can refer to namespace within a database.
cardinality of a relation: number of tuples in a relation is called cardinality of the relation.

Normalization theory deals with design of relational database schema.

Keys

key: any subset of a relation is called key.
super key: a key is called super key if it is sufficient to identify a unique tuple of a relation.
candidate key: a minimal super key is called candidate key i.e. no proper subset of a candidate key is super key.
primary key: a candidate key chosen as a principal to identify a unique tuple of a relation.It Restrict User Inputs like "Null Values" , "Duplicate Values" etc.
foreign key: a key of a relation which is a primary key of some other relation in the relational schema.

Database Design

entity: nothing but it represents field of the database e.g.,

employee

entity set: collection of different fields
relationship
relationship set
domain: The set of possible values for an attribute is called the

domain of the attribute e.g.: The domain of attribute marital status is having four values: single, married, divorced or widowed.

The domain of the attribute month is having twelve values ranging from January to December.

Key attribute: The attribute (or combination of attributes) that is

unique for every entity instance e.g.: the account number of an account, the employee id of an employee etc.

attribute: e.g., gender
composite attribute
degree of relationship: the number of entity types involved in a

database

cardinality of relationship: The minimum and maximum values of this

connectivity is called the cardinality of the relationship e.g., one to one, one to many, many to many

weak entity: Entity that depends on other entity for its existence and

doesn’t have key attribute (s) of its own e.g. : spouse of employee

identifying relationship
non-identifying relationship
discriminator

ER Diagram

entity relationship data model
Introduction to Data Modeling at University of Texas, Austin

Modelling concepts

Specialisation
Generalisation
Categorisation
Aggregation

Normalization

Normalization is the formal process for deciding which attributes should

be grouped together in a relation.The process of normalization was

first developed by E.F.CODD. Normalization is the process of evaluating and correcting the relation schemes to minimize data redundancies and thereby reducing anomalies.

First Normal Form

As per the rule of the 1NF an attributes of a table cannot hold multiple value. It should hold only atomic value.

Second Normal Form

Second Normal form is characterized by the property of functional dependency. a relation is in second normal form if it is in 1st normal form and every non key attribute is fully and functionally dependent on the primary key.

Third Normal Form

transitive dependency : A relation is in third normal form , if it is in second normal form and

no transitive dependencies exist.

Suppose A,B and C are the three attributes of a relation(R) then if A->B(B depends on A) B->C(C depends on B) then we can say that "C" depends transitively on "A".

Boyce Codd Normal Form (BCNF)

BCNF is based on functional dependencies that take into account all candidate keys in a relation. For a relation with only one candidate key, 3rd normal form and BCNF are equivalent. A relation is in BCNF if and only if every determinant is a candidate key. now what is a determinant? consider the following functional dependency: A→B where A and B are attributes in relation R. it says that B is functionally dependent on A. here A is referred to as determinant and B is the dependent.

BCNF is slightly stronger than 3nF.

Fourth Normal Form

multivalued dependencies are removed

Fifth Normal Form

Any remaining anomalies are removed.in this normal form we isolate semantically related multiple relationships.

Anomalies can be : Insertion Anomaly,Deletion Anomaly or Modification/update Anomaly. Any remaining anomalies are removed.in this normal form we isolate semantically related multiple relationships.

Query Languages

query: the retrieval of tuples from the relations of a relational schema
query language: a language used to retrieve information (tuples) from the relations of a relational schema.

types of query language

procedural language
non-procedural language

Mathematical query languages

Relational algebra
Tuple relational calculus
Domain relational calculus

Relational algebra

relational algebra is a procedural query language

- operations
  1. selection (r σ s) -
  2. projection (r ∏ s) -
  3. union: (r ∪ s)
  4. set difference: (r – s)
  5. Cartesian product: (r X s)
  6. rename: ( ρr )
  7. intersection:
  8. natural join:
  9. division:
  10. union:
  11. intersection:
  12. outer join

BY:NA

Tuple relational calculus

A tuple variable is variable that takes on tuples of a particular relation schema as values. That is, every value assigned to a given tuple variable has the same number and type of fields. A tuple relation calculus query has the form {T I p(t)} where T is a tuple variable and p(T) denotes a formula that describes T; The result of this query is the set of all tuples t for which the formula p(T)evaluates to true with T=t.

Domain relational calculus

SQL

SQL stands for Structured Query Language. SQL is a non procedural language.

References

relational algebra

XML

Extensible Mark-up Language.

XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to carry data, not to display data XML tags are not predefined. You must define your own tags XML is designed to be self-descriptive XML is a W3C Recommendation

XML

Application Design

JDBC

JDBC means Java Database Connectivity.

Storage Manager

Storage manager module of the database provides the interface between the following component:

data stored in the database
the application programs
queries submitted to the system

Components of a Storage Manager

file manager

function of the file manager is to manage disk space for storage and manage data structure used for storing information.

buffer manager

The buffer manager reads disk pages into a main memory page as needed. The collection of main memory pages (called frames) used by the buffer manager for this purpose is called the buffer pool. This is just an array of Page objects. The buffer manager is used by (the code for) access methods, heap files, and relational operators to read / write /allocate / de-allocate pages. The Buffer Manager makes calls to the underlying DB class object, which actually performs these functions on disk pages.

Replacement policies for the buffer manager can be changed easily at compile time.

Authorisation and integrity manager

Checks the authorization of users to access data and tests integrity constraints

transaction manager

Storage access

Indexing and hashing

Many queries reference only a small proportion of records in a file. For example, finding all records at Perryridge branch only returns records where bname = ``Perryridge''. We should be able to locate these records directly, rather than having to read every record and check its branch-name. We then need extra file structuring.

Parsing and translation

Optimization

Evaluation

As we know that, computer has two storage memories 1.primary storage 2.secondary storage

Query Processing

Query Processor
- Query parsing
- Query optimizer
- Query evaluation engine : It executes low level instructions given by the DML and retrieves the information from the storage manager.

Query Optimization

Transaction

transaction: collection logically related operations in a database application.
Transaction-management: ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.

authorization engine

Concurrency control

Concurrency-control: ensuring the consistency of the database in spite of interaction among the concurrent transactions

Recovery

To restore a physical backup of a data file or control file is to reconstruct it and make it available to the Oracle database server. To recover a restored data file is to update it by applying archived redo logs and online redo logs, that is, records of changes made to the database after the backup was taken. If you use RMAN, then you can also recover data files with incremental backups, which are backups of a data file that contain only blocks that changed after a previous incremental backup.

Distributed Database Management System

Distributed Database management Systems are the software for managing databases stored on multiple computers in the network.

Distributed Database: It is a set of databases stored on multiple computers that typically appears to application as a single database. It stores related over two or more physically independent sites that requires distributed processing. A database that is composed of different database fragments / parts.

Data Mining and Analysis

Data mining is searching information which is already stored in such a database.

DBMS/Printable version

Introduction

Introduction

File System

Need for Database Management Systems

Database Users

Enterprise Application Architecture

Database Tools

Database Architecture

Database abstraction

External view

Conceptual view :

Internal Level :

Database Schema

Database Instance

Data Model

Database Languages

Relational Database

Relational data model

Keys

Database Design

ER Diagram

Normalization

First Normal Form

Second Normal Form

Third Normal Form

Boyce Codd Normal Form (BCNF)

Fourth Normal Form

Fifth Normal Form

Query Languages

Query Languages

Relational algebra

Tuple relational calculus

Domain relational calculus

SQL

XML

Application Design

JDBC

Storage Manager

Components of a Storage Manager

file manager

buffer manager

Authorisation and integrity manager

transaction manager

Storage access

Indexing and hashing

Parsing and translation

Optimization

Evaluation

Query Processing

Query Optimization

Transaction

Concurrency control

Recovery

Distributed Database Management System

Data Mining and Analysis