DBMS/Printable version


DBMS

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/DBMS

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.

Introduction

Introduction

edit
Data
raw facts and figures
Data Processing
performing operations on the input data to generate output.
Database
collection of related information about a particular enterprise.
Database Management System
collection of interrelated data, set of programs to access the data conveniently and efficiently.

File System

edit

Typically data is stored in files.

1.) flat files contains one table at a time generally

2.) flat files contains values at each row and separated with a special symbol so to reach the data you have to parse each row and obtaining an array of values and then you can query the data.

3.) to control the data in a file , you have to read it line by line and parse it and because of it they are slow

4.) there is no control mechanism in files

5.) dbms systems commonly has a special language called sql to modify and reach the data easily

6.) dbms systems has indexes to reach the data, not read it line by line

7.) there are much more control mechanisms that approves the correctness of data

8.) you can reach the data across networks by using a dbms system easily and safely

Need for Database Management Systems

edit

Data access through file systems suffers from the following problems.

  • minimal data redundancy
  • difficulty in accessing data
  • inconsistency
  • concurrent access
  • data isolation
  • integrity
  • atomicity
  • security

A good database management system solves all the above mentioned problems which a user generally faces in a file system.

Database Users

edit

Following are the types of database users:

  • Database Administrator(DBA) - a database administrator performs the following activity
    • Granting user authority to access the database
    • Enforcing security and integrity rules
    • Strategizing backup & recovery
    • enforcing security and integrity rules
  • Application programmers
  • Data analysts
  • Naive users

Enterprise Application Architecture

edit
three tier architecture
Modern enterprise application architecture consists of the following layers
  1. application client
  2. application server
  3. database system

Present trend in the enterprise architecture is moving towards n-tier architecture in which each of the above three layers is further distributed among multiple systems or layers

Database Tools

edit
  • query tools
  • administration tools

Database Architecture

edit

Following are the different database architecture types:

  • Centralized
  • Client-server
  • Parallel (multi-processor)
  • Distributed

Database abstraction

edit
  • physical abstraction
  • logical abstraction
  • view abstraction

To the design of complex data structure for the representation of data in the database. However since database system are often used by non computer professionals, this complexcity must be hidden from database system users. This is done by defining level of abstract as which the database may be viewed, There are logical view or external, conceptual view and internal view or physical view.

External view

This is the highest level of abstraction as seen by a user. This level of abstraction describe only part of entire database.

Conceptual view :

This is the next lower level of abstraction which is the sum total of user's views. This level describe what data are actually stored in the database. This level contain information about entire database in term of small number of relatively simple structure.

Internal Level :

This is the lowest level of abstraction at which one describe how the data are physically sorted.

Database Schema

edit

Definition : Overall logical design of data base. Schema contains 'No of records + Type of data + No of attributes'

  • External level or Sub schema
  • logical schema
  • physical schema

Database Instance

edit

Definition: The term instance is typically used to describe a complete database environment, including the RDBMS software, table structure, stored procedures and other functionality. It is most commonly used when administrators describe multiple instances of the same database.

Definition: The information stored in database at the particular movement is called instance.

Also Known As: environment is the place where the data can be stored in manageable manner. Examples: An organization with an employees database might have three different instances: production (used to contain live data), pre-production (used to test new functionality prior to release into production) and development (used by database developers to create new functionality).

Data Model

edit

What is data model:it consists of some concepts to describe the structure of database i.e., data type, relations, and constraints that should hold on the data. E.g. ER model

  • XML data Model - semi structured
  • relational data model
  • object oriented
  • object relational
  • network
  • hierarchy

Database Languages

edit
  • procedural - what data is required and how to get this data
  • declarative - what data is required without describing how to get this data
  • data control language (DCL)
  • data manipulation language (DML)
  • data definition language (DDL)
  • data storage language


Relational Database

A database based on relational algebra or relation model is called relation database

Relational data model

edit
relation
a table in a relational database is called relation in the mathematical language of relational algebra. relations are unordered.
attribute
column of a table in database table is called attributes. columns or attributes have names.
domain
set of permissible values for an attribute ( or column) is called domain.
tuple
a row in the database table is called tuple in the mathematical language of relational algebra. order of tuples in a relation has no significance.
database
a database is a collection of multiple relations.
schema
a database design is called schema, alternatively, a schema can refer to namespace within a database.
cardinality of a relation
number of tuples in a relation is called cardinality of the relation.

Normalization theory deals with design of relational database schema.

Keys

edit
key
any subset of a relation is called key.
super key
a key is called super key if it is sufficient to identify a unique tuple of a relation.
candidate key
a minimal super key is called candidate key i.e. no proper subset of a candidate key is super key.
primary key
a candidate key chosen as a principal to identify a unique tuple of a relation.It Restrict User Inputs like "Null Values" , "Duplicate Values" etc.
foreign key
a key of a relation which is a primary key of some other relation in the relational schema.


Database Design

entity
nothing but it represents field of the database e.g.,

employee

entity set
collection of different fields
relationship
relationship set
domain
The set of possible values for an attribute is called the

domain of the attribute e.g.: The domain of attribute marital status is having four values: single, married, divorced or widowed.

The domain of the attribute month is having twelve values ranging from January to December.

Key attribute
The attribute (or combination of attributes) that is

unique for every entity instance e.g.: the account number of an account, the employee id of an employee etc.

attribute
e.g., gender
composite attribute
degree of relationship
the number of entity types involved in a

database

cardinality of relationship
The minimum and maximum values of this

connectivity is called the cardinality of the relationship e.g., one to one, one to many, many to many

weak entity
Entity that depends on other entity for its existence and
doesn’t have key attribute (s) of its own e.g. : spouse of employee
identifying relationship
non-identifying relationship
discriminator

ER Diagram

edit

Modelling concepts

  • Specialisation
  • Generalisation
  • Categorisation
  • Aggregation

Normalization

edit

Normalization is the formal process for deciding which attributes should

be grouped together in a relation.The process of normalization was 

first developed by E.F.CODD. Normalization is the process of evaluating and correcting the relation schemes to minimize data redundancies and thereby reducing anomalies.

First Normal Form

edit

As per the rule of the 1NF an attributes of a table cannot hold multiple value. It should hold only atomic value.

Second Normal Form

edit

Second Normal form is characterized by the property of functional dependency. a relation is in second normal form if it is in 1st normal form and every non key attribute is fully and functionally dependent on the primary key.

Third Normal Form

edit

transitive dependency : A relation is in third normal form , if it is in second normal form and

no transitive dependencies exist.

Suppose A,B and C are the three attributes of a relation(R) then if A->B(B depends on A) B->C(C depends on B) then we can say that "C" depends transitively on "A".

Boyce Codd Normal Form (BCNF)

edit

BCNF is based on functional dependencies that take into account all candidate keys in a relation. For a relation with only one candidate key, 3rd normal form and BCNF are equivalent. A relation is in BCNF if and only if every determinant is a candidate key. now what is a determinant? consider the following functional dependency: A→B where A and B are attributes in relation R. it says that B is functionally dependent on A. here A is referred to as determinant and B is the dependent.

BCNF is slightly stronger than 3nF.

Fourth Normal Form

edit
multivalued dependencies are removed

Fifth Normal Form

edit

Any remaining anomalies are removed.in this normal form we isolate semantically related multiple relationships.

Anomalies can be : Insertion Anomaly,Deletion Anomaly or Modification/update Anomaly. Any remaining anomalies are removed.in this normal form we isolate semantically related multiple relationships.


Query Languages

Query Languages

edit
query
the retrieval of tuples from the relations of a relational schema
query language
a language used to retrieve information (tuples) from the relations of a relational schema.

types of query language

  • procedural language
  • non-procedural language

Mathematical query languages

  • Relational algebra
  • Tuple relational calculus
  • Domain relational calculus

Relational algebra

edit

relational algebra is a procedural query language

    • operations
      1. selection (r σ s) -
      2. projection (r ∏ s) -
      3. union: (r ∪ s)
      4. set difference: (r – s)
      5. Cartesian product: (r X s)
      6. rename: ( ρr )
      7. intersection:
      8. natural join:
      9. division:
      10. union:
      11. intersection:
      12. outer join

BY:NA

Tuple relational calculus

edit

A tuple variable is variable that takes on tuples of a particular relation schema as values. That is, every value assigned to a given tuple variable has the same number and type of fields. A tuple relation calculus query has the form {T I p(t)} where T is a tuple variable and p(T) denotes a formula that describes T; The result of this query is the set of all tuples t for which the formula p(T)evaluates to true with T=t.

Domain relational calculus

edit


SQL

SQL stands for Structured Query Language. SQL is a non procedural language.

References

relational algebra


XML

Extensible Mark-up Language.

XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to carry data, not to display data XML tags are not predefined. You must define your own tags XML is designed to be self-descriptive XML is a W3C Recommendation

XML


Application Design

JDBC

edit

JDBC means Java Database Connectivity.


Storage Manager

Storage manager module of the database provides the interface between the following component:

  • data stored in the database
  • the application programs
  • queries submitted to the system

Components of a Storage Manager

edit

file manager

edit

function of the file manager is to manage disk space for storage and manage data structure used for storing information.

buffer manager

edit

The buffer manager reads disk pages into a main memory page as needed. The collection of main memory pages (called frames) used by the buffer manager for this purpose is called the buffer pool. This is just an array of Page objects. The buffer manager is used by (the code for) access methods, heap files, and relational operators to read / write /allocate / de-allocate pages. The Buffer Manager makes calls to the underlying DB class object, which actually performs these functions on disk pages.

Replacement policies for the buffer manager can be changed easily at compile time.

Authorisation and integrity manager

edit

Checks the authorization of users to access data and tests integrity constraints

transaction manager

edit

Storage access

edit

Indexing and hashing

edit

Many queries reference only a small proportion of records in a file. For example, finding all records at Perryridge branch only returns records where bname = ``Perryridge''. We should be able to locate these records directly, rather than having to read every record and check its branch-name. We then need extra file structuring.

Parsing and translation

edit

Optimization

edit

Evaluation

edit

As we know that, computer has two storage memories 1.primary storage 2.secondary storage


Query Processing

  • Query Processor
    • Query parsing
    • Query optimizer
    • Query evaluation engine : It executes low level instructions given by the DML and retrieves the information from the storage manager.

Query Optimization

edit


Transaction

transaction
collection logically related operations in a database application.
Transaction-management
ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.
  • authorization engine


Concurrency control

Concurrency-control
ensuring the consistency of the database in spite of interaction among the concurrent transactions


Recovery

To restore a physical backup of a data file or control file is to reconstruct it and make it available to the Oracle database server. To recover a restored data file is to update it by applying archived redo logs and online redo logs, that is, records of changes made to the database after the backup was taken. If you use RMAN, then you can also recover data files with incremental backups, which are backups of a data file that contain only blocks that changed after a previous incremental backup.


Distributed Database Management System

Distributed Database management Systems are the software for managing databases stored on multiple computers in the network.

Distributed Database: It is a set of databases stored on multiple computers that typically appears to application as a single database. It stores related over two or more physically independent sites that requires distributed processing. A database that is composed of different database fragments / parts.


Data Mining and Analysis

Data mining is searching information which is already stored in such a database.