DBMS/Printable version


The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.


Introduction Edit

raw facts and figures
Data Processing
performing operations on the input data to generate output.
collection of related information about a particular enterprise.
Database Management System
collection of interrelated data, set of programs to access the data conveniently and efficiently.

File System Edit

Typically data is stored in files.

1.) flat files contains one table at a time generally

2.) flat files contains values at each row and separated with a special symbol so to reach the data you have to parse each row and obtaining an array of values and then you can query the data.

3.) to control the data in a file , you have to read it line by line and parse it and because of it they are slow

4.) there is no control mechanism in files

5.) dbms systems commonly has a special language called sql to modify and reach the data easily

6.) dbms systems has indexes to reach the data, not read it line by line

7.) there are much more control mechanisms that approves the correctness of data

8.) you can reach the data across networks by using a dbms system easily and safely

Need for Database Management Systems Edit

Data access through file systems suffers from the following problems.

  • minimal data redundancy
  • difficulty in accessing data
  • inconsistency
  • concurrent access
  • data isolation
  • integrity
  • atomicity
  • security

A good database management system solves all the above mentioned problems which a user generally faces in a file system.

Database Users Edit

Following are the types of database users:

  • Database Administrator(DBA) - a database administrator performs the following activity
    • Granting user authority to access the database
    • Enforcing security and integrity rules
    • Strategizing backup & recovery
    • enforcing security and integrity rules
  • Application programmers
  • Data analysts
  • Naive users

Enterprise Application Architecture Edit

three tier architecture
Modern enterprise application architecture consists of the following layers
  1. application client
  2. application server
  3. database system

Present trend in the enterprise architecture is moving towards n-tier architecture in which each of the above three layers is further distributed among multiple systems or layers

Database Tools Edit

  • query tools
  • administration tools

Database Architecture Edit

Following are the different database architecture types:

  • Centralized
  • Client-server
  • Parallel (multi-processor)
  • Distributed

Database abstraction Edit

  • physical abstraction
  • logical abstraction
  • view abstraction

To the design of complex data structure for the representation of data in the database. However since database system are often used by non computer professionals, this complexcity must be hidden from database system users. This is done by defining level of abstract as which the database may be viewed, There are logical view or external, conceptual view and internal view or physical view.

External view

This is the highest level of abstraction as seen by a user. This level of abstraction describe only part of entire database.

Conceptual view :

This is the next lower level of abstraction which is the sum total of user's views. This level describe what data are actually stored in the database. This level contain information about entire database in term of small number of relatively simple structure.

Internal Level :

This is the lowest level of abstraction at which one describe how the data are physically sorted.

Database Schema Edit

Definition : Overall logical design of data base. Schema contains 'No of records + Type of data + No of attributes'

  • External level or Sub schema
  • logical schema
  • physical schema

Database Instance Edit

Definition: The term instance is typically used to describe a complete database environment, including the RDBMS software, table structure, stored procedures and other functionality. It is most commonly used when administrators describe multiple instances of the same database.

Definition: The information stored in database at the particular movement is called instance.

Also Known As: environment is the place where the data can be stored in manageable manner. Examples: An organization with an employees database might have three different instances: production (used to contain live data), pre-production (used to test new functionality prior to release into production) and development (used by database developers to create new functionality).

Data Model Edit

What is data model:it consists of some concepts to describe the structure of database i.e., data type, relations, and constraints that should hold on the data. E.g. ER model

  • XML data Model - semi structured
  • relational data model
  • object oriented
  • object relational
  • network
  • hierarchy

Database Languages Edit

  • procedural - what data is required and how to get this data
  • declarative - what data is required without describing how to get this data
  • data control language (DCL)
  • data manipulation language (DML)
  • data definition language (DDL)
  • data storage language

Relational Database

A database based on relational algebra or relation model is called relation database

Relational data model Edit

a table in a relational database is called relation in the mathematical language of relational algebra. relations are unordered.
column of a table in database table is called attributes. columns or attributes have names.
set of permissible values for an attribute ( or column) is called domain.
a row in the database table is called tuple in the mathematical language of relational algebra. order of tuples in a relation has no significance.
a database is a collection of multiple relations.
a database design is called schema, alternatively, a schema can refer to namespace within a database.
cardinality of a relation
number of tuples in a relation is called cardinality of the relation.

Normalization theory deals with design of relational database schema.

Keys Edit

any subset of a relation is called key.
super key
a key is called super key if it is sufficient to identify a unique tuple of a relation.
candidate key
a minimal super key is called candidate key i.e. no proper subset of a candidate key is super key.
primary key
a candidate key chosen as a principal to identify a unique tuple of a relation.It Restrict User Inputs like "Null Values" , "Duplicate Values" etc.
foreign key
a key of a relation which is a primary key of some other relation in the relational schema.

Database Design

nothing but it represents field of the database e.g.,


entity set
collection of different fields
relationship set
The set of possible values for an attribute is called the

domain of the attribute e.g.: The domain of attribute marital status is having four values: single, married, divorced or widowed.

The domain of the attribute month is having twelve values ranging from January to December.

Key attribute
The attribute (or combination of attributes) that is

unique for every entity instance e.g.: the account number of an account, the employee id of an employee etc.

e.g., gender
composite attribute
degree of relationship
the number of entity types involved in a


cardinality of relationship
The minimum and maximum values of this

connectivity is called the cardinality of the relationship e.g., one to one, one to many, many to many

weak entity
Entity that depends on other entity for its existence and
doesn’t have key attribute (s) of its own e.g. : spouse of employee
identifying relationship
non-identifying relationship

ER Diagram Edit

Modelling concepts

  • Specialisation
  • Generalisation
  • Categorisation
  • Aggregation

Normalization Edit

Normalization is the formal process for deciding which attributes should

be grouped together in a relation.The process of normalization was 

first developed by E.F.CODD. Normalization is the process of evaluating and correcting the relation schemes to minimize data redundancies and thereby reducing anomalies.

First Normal Form Edit

As per the rule of the 1NF an attributes of a table cannot hold multiple value. It should hold only atomic value.

Second Normal Form Edit

Second Normal form is characterized by the property of functional dependency. a relation is in second normal form if it is in 1st normal form and every non key attribute is fully and functionally dependent on the primary key.

Third Normal Form Edit

transitive dependency : A relation is in third normal form , if it is in second normal form and

no transitive dependencies exist.

Suppose A,B and C are the three attributes of a relation(R) then if A->B(B depends on A) B->C(C depends on B) then we can say that "C" depends transitively on "A".

Boyce Codd Normal Form (BCNF) Edit

BCNF is based on functional dependencies that take into account all candidate keys in a relation. For a relation with only one candidate key, 3rd normal form and BCNF are equivalent. A relation is in BCNF if and only if every determinant is a candidate


'''' now what is a determinant? consider the following functional dependency: A→B where A and B are attributes in relation R. it says that B is functionally dependent on A. here A is referred to as determinant and B is the dependent.

BCNF is slightly stronger than 3nF.

Fourth Normal Form Edit

multivalued dependencies are removed

Fifth Normal Form Edit

Any remaining anomalies are removed.in this normal form we isolate semantically related multiple relationships.

Anomalies can be : Insertion Anomaly,Deletion Anomaly or Modification/update Anomaly. Any remaining anomalies are removed.in this normal form we isolate semantically related multiple relationships.

Query Languages

Query Languages Edit

the retrieval of tuples from the relations of a relational schema
query language
a language used to retrieve information (tuples) from the relations of a relational schema.

types of query language

  • procedural language
  • non-procedural language

Mathematical query languages

  • Relational algebra
  • Tuple relational calculus
  • Domain relational calculus

Relational algebra Edit

relational algebra is a procedural query language

    • operations
      1. selection (r σ s) -
      2. projection (r ∏ s) -
      3. union: (r ∪ s)
      4. set difference: (r – s)
      5. Cartesian product: (r X s)
      6. rename: ( ρr )
      7. intersection:
      8. natural join:
      9. division:
      10. union:
      11. intersection:
      12. outer join


Tuple relational calculus Edit

A tuple variable is variable that takes on tuples of a particular relation schema as values. That is, every value assigned to a given tuple variable has the same number and type of fields. A tuple relation calculus query has the form {T I p(t)} where T is a tuple variable and p(T) denotes a formula that describes T; The result of this query is the set of all tuples t for which the formula p(T)evaluates to true with T=t.

Domain relational calculus Edit


SQL stands for Structured Query Language. SQL is a non procedural language.


relational algebra


Extensible Mark-up Language.

XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to carry data, not to display data XML tags are not predefined. You must define your own tags XML is designed to be self-descriptive XML is a W3C Recommendation


Application Design


JDBC means Java Database Connectivity.

Storage Manager

Storage manager module of the database provides the interface between the following component:

  • data stored in the database
  • the application programs
  • queries submitted to the system

Components of a Storage Manager Edit

file manager Edit

function of the file manager is to manage disk space for storage and manage data structure used for storing information.

buffer manager Edit

The buffer manager reads disk pages into a main memory page as needed. The collection of main memory pages (called frames) used by the buffer manager for this purpose is called the buffer pool. This is just an array of Page objects. The buffer manager is used by (the code for) access methods, heap files, and relational operators to read / write /allocate / de-allocate pages. The Buffer Manager makes calls to the underlying DB class object, which actually performs these functions on disk pages.

Replacement policies for the buffer manager can be changed easily at compile time.

data integrity manager Edit

transaction manager Edit

Storage access Edit

Indexing and hashing Edit

Many queries reference only a small proportion of records in a file. For example, finding all records at Perryridge branch only returns records where bname = ``Perryridge. We should be able to locate these records directly, rather than having to read every record and check its branch-name. We then need extra file structuring.

Parsing and translation Edit

Optimization Edit

Evaluation Edit

As we know that, computer has two storage memories 1.primary storage 2.secondary storage

Query Processing

  • Query Processor
    • Query parsing
    • Query optimizer
    • Query evaluation engine : It executes low level instructions given by the DML and retrieves the information from the storage manager.

Query Optimization Edit


collection logically related operations in a database application.
ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.
  • authorization engine

Concurrency control

ensuring the consistency of the database in spite of interaction among the concurrent transactions


To restore a physical backup of a data file or control file is to reconstruct it and make it available to the Oracle database server. To recover a restored data file is to update it by applying archived redo logs and online redo logs, that is, records of changes made to the database after the backup was taken. If you use RMAN, then you can also recover data files with incremental backups, which are backups of a data file that contain only blocks that changed after a previous incremental backup.

Distributed Database Management System

Distributed Database management Systems are the software for managing databases stored on multiple computers in the network.

Distributed Database: It is a set of databases stored on multiple computers that typically appears to application as a single database. It stores related over two or more physically independent sites that requires distributed processing. A database that is composed of different database fragments / parts.

Data Mining and Analysis

Data mining is searching information which is already stored in such a database.