Last modified on 20 October 2014, at 21:51

DBMS

IntroductionEdit

Data 
raw facts and figures
Data Processing 
performing operations on the input data to generate output.
Database 
collection of related information about a particular enterprise.
Database Management System 
collection of interrelated data, set of programs to access the data conveniently and efficiently.

File SystemEdit

Typically data is stored in files.

1.) flat files contains one table at a time generally

2.) flat files contains values at each row and separated with a special symbol so to reach the data you have to parse each row and obtaining an array of values and then you can query the data.

3.) to control the data in a file , you have to read it line by line and parse it and because of it they are slow

4.) there is no control mechanism in files

5.) dbms systems commonly has a special language called sql to modify and reach the data easily

6.) dbms systems has indexes to reach the data, not read it line by line

7.) there are much more control mechanisms that approves the correctness of data

8.) you can reach the data across networks by using a dbms system easily and safely

Need for Database Management SystemsEdit

Data access through file systems suffers from the following problems.

  • minimal data redundancy
  • difficulty in accessing data
  • inconsistency
  • concurrent access
  • data isolation
  • integrity
  • atomicity
  • security

A good database management system solves all the above mentioned problems which a user generally faces in a file system.

Database UsersEdit

Following are the types of database users:

  • Database Administrator(DBA) - a database administrator performs the following activity
    • Granting user authority to access the database
    • Enforcing security and integrity rules
    • Strategizing backup & recovery
    • enforcing security and integrity rules
  • Application programmers
  • Data analysts
  • Naive users

Enterprise Application ArchitectureEdit

three tier architecture 
Modern enterprise application architecture consists of the following layers
  1. application client
  2. application server
  3. database system

Present trend in the enterprise architecture is moving towards n-tier architecture in which each of the above three layers is further distributed among multiple systems or layers

Database ToolsEdit

  • query tools
  • administration tools

Database ArchitectureEdit

Following are the different database architecture types:

  • Centralized
  • Client-server
  • Parallel (multi-processor)
  • Distributed

Database abstractionEdit

  • physical abstraction
  • logical abstraction
  • view abstraction

To the design of complex data structure for the representation of data in the database. However since database system are often used by non computer professionals, this complexcity must be hidden from database system users. This is done by defining level of abstract as which the database may be viewed, There are logical view or external, conceptual view and internal view or physical view.

External view

This is the highest level of abstraction as seen by a user. This level of abstraction describe only part of entire database.

Conceptual view :

This is the next lower level of abstraction which is the sum total of user's views. This level describe what data are actually stored in the database. This level contain information about entire database in term of small number of relatively simple structure.

Internal Level :

This is the lowest level of abstraction at which one describe how the data are physically sorted.

Database SchemaEdit

Definition : Overall logical design of data base. Schema contains 'No of records + Type of data + No of attributes'

  • External level or Sub schema
  • logical schema
  • physical schema

Database InstanceEdit

Definition: The term instance is typically used to describe a complete database environment, including the RDBMS software, table structure, stored procedures and other functionality. It is most commonly used when administrators describe multiple instances of the same database.

Definition: The information stored in database at the particular movement is called instance.

Also Known As: environment is the place where the data can be stored in manageable manner. Examples: An organization with an employees database might have three different instances: production (used to contain live data), pre-production (used to test new functionality prior to release into production) and development (used by database developers to create new functionality).

Data ModelEdit

What is data model:it consists of some concepts to describe the structure of database i.e., data type, relations, and constraints that should hold on the data. E.g. ER model

  • XML data Model - semi structured
  • relational data model
  • object oriented
  • object relational
  • network
  • hierarchy

⋅ Database Languages ===

  • procedural - what data is required and how to get this data
  • declarative - what data is required without describing how to get this data
  • data control language (DCL)
  • data manipulation language (DML)
  • data definition language (DDL)
  • data storage language

Relational DatabaseEdit

A database based on relational algebra or relation model is called relation database

Relational data modelEdit

relation 
a table in a relational database is called relation in the mathematical language of relational algebra. relations are unordered.
attribute 
column of a table in database table is called attributes. columns or attributes have names.
domain 
set of permissible values for an attribute ( or column) is called domain.
tuple 
a row in the database table is called tuple in the mathematical language of relational algebra. order of tuples in a relation has no significance.
database 
a database is a collection of multiple relations.
schema 
a database design is called schema, alternatively, a schema can refer to namespace within a database.
cardinality of a relation 
number of tuples in a relation is called cardinality of the relation.

Normalization theory deals with design of relational database schema.

KeysEdit

key 
any subset of a relation is called key.
super key 
a key is called super key if it is sufficient to identify a unique tuple of a relation.
candidate key 
a minimal super key is called candidate key i.e. no proper subset of a candidate key is super key.
primary key 
a candidate key chosen as a principal to identify a unique tuple of a relation.It Restrict User Inputs like "Null Values" , "Duplicate Values" etc.
foreign key 
a key of a relation which is a primary key of some other relation in the relational schema.

Database DesignEdit

entity 
nothing but it represents field of the database e.g.,

employee

entity set 
collection of different fields
relationship 
relationship set 
domain 
The set of possible values for an attribute is called the

domain of the attribute e.g.: The domain of attribute marital status is having four values: single, married, divorced or widowed.

The domain of the attribute month is having twelve values ranging from January to December.

Key attribute
The attribute (or combination of attributes) that is

unique for every entity instance e.g.: the account number of an account, the employee id of an employee etc.

attribute 
e.g., gender
composite attribute 
degree of relationship 
the number of entity types involved in a

database

cardinality of relationship 
The minimum and maximum values of this

connectivity is called the cardinality of the relationship e.g., one to one, one to many, many to many

weak entity 
Entity that depends on other entity for its existence and
doesn’t have key attribute (s) of its own e.g. : spouse of employee
identifying relationship 
non-identifying relationship 
discriminator 

ER DiagramEdit

Introduction to Data Modeling at University of Texas, Austin]

Modelling concepts

  • Specialisation
  • Generalisation
  • Categorisation
  • Aggregation

NormalizationEdit

Normalization is the formal process for deciding which attributes should

be grouped together in a relation.The process of normalization was 

first developed by E.F.CODD. Normalization is the process of evaluating and correcting the relation schemes to minimize data redundancies and thereby reducing anomalies.

First Normal FormEdit

A relational schema is in first normal form if the domains of all the attributes are atomic. First Normal form is characterized by the property of atomicity. Means only one value for each tuple.

Second Normal FormEdit

Second Normal form is characterized by the property of functional dependency. a relation is in second normal form if it is in 1st normal form and every non key attribute is fully and functionally dependent on the primary key.

Third Normal FormEdit

transitive dependency : A relation is in third normal form , if it is in second normal form and

no transitive dependencies exist.

Suppose A,B and C are the three attributes of a relation(R) then if A->B(B depends on A) B->C(C depends on B) then we can say that "C" depends transitively on "A".

Boyce Codd Normal Form (BCNF)Edit

BCNF is based on functional dependencies that take into account all candidate keys in a relation. For a relation with only one candidate key, 3rd normal form and BCNF are equivalent. A relation is in BCNF if and only if every determinant is a candidate

key.

'''' now what is a determinant? consider the following functional dependency: A→B where A and B are attributes in relation R. it says that B is functionally dependent on A. here A is referred to as determinant and B is the dependent.

BCNF is slightly stronger than 3nF.

Fourth Normal FormEdit

multivalued dependencies are removed 

Fifth Normal FormEdit

Any remaining anomalies are removed.in this normal form we isolate semantically related multiple relationships.

Anomalies can be : Insertion Anomaly,Deletion Anomaly or Modification/update Anomaly. Any remaining anomalies are removed.in this normal form we isolate semantically related multiple relationships.

Query LanguagesEdit

query 
the retrieval of tuples from the relations of a relational schema
query language 
a language used to retrieve information (tuples) from the relations of a relational schema.

types of query language

  • procedural language
  • non-procedural language

Mathematical query languages

  • Relational algebra
  • Tuple relational calculus
  • Domain relational calculus

Relational algebraEdit

relational algebra is a procedural query language

    • operations
      1. selection (r σ s) -
      2. projection (r ∏ s) -
      3. union: (r ∪ s)
      4. set difference: (r – s)
      5. Cartesian product: (r X s)
      6. rename: ( ρr )
      7. intersection:
      8. natural join:
      9. division:
      10. union:
      11. intersection:
      12. outer join

BY:NA

Tuple relational calculusEdit

A tuple variable is variable that takes on tuples of a particular relation schema as values. That is, every value assigned to a given tuple variable has the same number and type of fields. A tuple relation calculus query has the form {T I p(t)} where T is a tuple variable and p(T) denotes a formula that describes T; The result of this query is the set of all tuples t for which the formula p(T)evaluates to true with T=t.

Domain relational calculusEdit

SQLEdit

SQL stands for Structured Query Language. SQL is a non procedural language.

References

relational algebra

XMLEdit

Extensible Mark-up Language.

XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to carry data, not to display data XML tags are not predefined. You must define your own tags XML is designed to be self-descriptive XML is a W3C Recommendation

XML

Application DesignEdit

JDBCEdit

JDBC means Java Database Connectivity.

Storage ManagerEdit

Storage manager module of the database provides the interface between the following component:

  • data stored in the database
  • the application programs
  • queries submitted to the system

Components of a Storage ManagerEdit

file managerEdit

function of the file manager is to manage disk space for storage and manage data structure used for storing information.

buffer managerEdit

The buffer manager reads disk pages into a main memory page as needed. The collection of main memory pages (called frames) used by the buffer manager for this purpose is called the buffer pool. This is just an array of Page objects. The buffer manager is used by (the code for) access methods, heap files, and relational operators to read / write /allocate / de-allocate pages. The Buffer Manager makes calls to the underlying DB class object, which actually performs these functions on disk pages.

Replacement policies for the buffer manager can be changed easily at compile time.

data integrity managerEdit

transaction managerEdit

Storage accessEdit

Indexing and hashingEdit

Many queries reference only a small proportion of records in a file. For example, finding all records at Perryridge branch only returns records where bname = ``Perryridge. We should be able to locate these records directly, rather than having to read every record and check its branch-name. We then need extra file structuring.

Parsing and translationEdit

OptimizationEdit

EvaluationEdit

As we know that, computer has two storage memories 1.primary storage 2.secondary storage

Query ProcessingEdit

  • Query Processor
    • Query parsing
    • Query optimizer
    • Query evaluation engine : It executes low level instructions given by the DML and retrieves the information from the storage manager.

Query OptimizationEdit

TransactionEdit

transaction 
collection logically related operations in a database application.
Transaction-management
ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.
  • authorization engine

Concurrency controlEdit

Concurrency-control
ensuring the consistency of the database in spite of interaction among the concurrent transactions


RecoveryEdit

To restore a physical backup of a data file or control file is to reconstruct it and make it available to the Oracle database server. To recover a restored data file is to update it by applying archived redo logs and online redo logs, that is, records of changes made to the database after the backup was taken. If you use RMAN, then you can also recover data files with incremental backups, which are backups of a data file that contain only blocks that changed after a previous incremental backup.

Object Relational Database Management SystemEdit

Distributed Database Management SystemEdit

Distributed Database management Systems are the software for managing databases stored on multiple computers in the network.

Distributed Database: It is a set of databases stored on multiple computers that typically appears to application as a single database. It stores related over two or more physically independent sites that requires distributed processing. A database that is composed of different database fragments / parts.

Data Mining and AnalysisEdit

Data mining is searching information which is already stored in such a database.

Parallel DatabasesEdit

ReferencesEdit

Database e-learning

Online interactive SQL tutorial

A Gentle Introduction to SQL

Some topics of DBMS by Lecturer Manik Chand Patnaik

Primary Key and Indexing

1Keydata SQL Tutorial

w3resource SQL Tutorial