DBMS/Printable version
This is the print version of DBMS You won't see this message or any elements not part of the book's content when you print or preview this page. |
The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/DBMS
Introduction
Introduction
edit- Data
- raw facts and figures
- Data Processing
- performing operations on the input data to generate output.
- Database
- collection of related information about a particular enterprise.
- Database Management System
- collection of interrelated data, set of programs to access the data conveniently and efficiently.
File System
editTypically data is stored in files.
1.) flat files contains one table at a time generally
2.) flat files contains values at each row and separated with a special symbol so to reach the data you have to parse each row and obtaining an array of values and then you can query the data.
3.) to control the data in a file , you have to read it line by line and parse it and because of it they are slow
4.) there is no control mechanism in files
5.) dbms systems commonly has a special language called sql to modify and reach the data easily
6.) dbms systems has indexes to reach the data, not read it line by line
7.) there are much more control mechanisms that approves the correctness of data
8.) you can reach the data across networks by using a dbms system easily and safely
Need for Database Management Systems
editData access through file systems suffers from the following problems.
- minimal data redundancy
- difficulty in accessing data
- inconsistency
- concurrent access
- data isolation
- integrity
- atomicity
- security
A good database management system solves all the above mentioned problems which a user generally faces in a file system.
Database Users
editFollowing are the types of database users:
- Database Administrator(DBA) - a database administrator performs the following activity
- Granting user authority to access the database
- Enforcing security and integrity rules
- Strategizing backup & recovery
- enforcing security and integrity rules
- Application programmers
- Data analysts
- Naive users
Enterprise Application Architecture
edit- three tier architecture
- Modern enterprise application architecture consists of the following layers
- application client
- application server
- database system
Present trend in the enterprise architecture is moving towards n-tier architecture in which each of the above three layers is further distributed among multiple systems or layers
Database Tools
edit- query tools
- administration tools
Database Architecture
editFollowing are the different database architecture types:
- Centralized
- Client-server
- Parallel (multi-processor)
- Distributed
Database abstraction
edit- physical abstraction
- logical abstraction
- view abstraction
To the design of complex data structure for the representation of data in the database. However since database system are often used by non computer professionals, this complexcity must be hidden from database system users. This is done by defining level of abstract as which the database may be viewed, There are logical view or external, conceptual view and internal view or physical view.
External view
This is the highest level of abstraction as seen by a user. This level of abstraction describe only part of entire database.
Conceptual view :
This is the next lower level of abstraction which is the sum total of user's views. This level describe what data are actually stored in the database. This level contain information about entire database in term of small number of relatively simple structure.
Internal Level :
This is the lowest level of abstraction at which one describe how the data are physically sorted.
Database Schema
editDefinition : Overall logical design of data base. Schema contains 'No of records + Type of data + No of attributes'
- External level or Sub schema
- logical schema
- physical schema
Database Instance
editDefinition: The term instance is typically used to describe a complete database environment, including the RDBMS software, table structure, stored procedures and other functionality. It is most commonly used when administrators describe multiple instances of the same database.
Definition: The information stored in database at the particular movement is called instance.
Also Known As: environment is the place where the data can be stored in manageable manner. Examples: An organization with an employees database might have three different instances: production (used to contain live data), pre-production (used to test new functionality prior to release into production) and development (used by database developers to create new functionality).
Data Model
editWhat is data model:it consists of some concepts to describe the structure of database i.e., data type, relations, and constraints that should hold on the data. E.g. ER model
- XML data Model - semi structured
- relational data model
- object oriented
- object relational
- network
- hierarchy
Database Languages
edit- procedural - what data is required and how to get this data
- declarative - what data is required without describing how to get this data
- data control language (DCL)
- data manipulation language (DML)
- data definition language (DDL)
- data storage language
Relational Database
A database based on relational algebra or relation model is called relation database
Relational data model
edit- relation
- a table in a relational database is called relation in the mathematical language of relational algebra. relations are unordered.
- attribute
- column of a table in database table is called attributes. columns or attributes have names.
- domain
- set of permissible values for an attribute ( or column) is called domain.
- tuple
- a row in the database table is called tuple in the mathematical language of relational algebra. order of tuples in a relation has no significance.
- database
- a database is a collection of multiple relations.
- schema
- a database design is called schema, alternatively, a schema can refer to namespace within a database.
- cardinality of a relation
- number of tuples in a relation is called cardinality of the relation.
Normalization theory deals with design of relational database schema.
Keys
edit- key
- any subset of a relation is called key.
- super key
- a key is called super key if it is sufficient to identify a unique tuple of a relation.
- candidate key
- a minimal super key is called candidate key i.e. no proper subset of a candidate key is super key.
- primary key
- a candidate key chosen as a principal to identify a unique tuple of a relation.It Restrict User Inputs like "Null Values" , "Duplicate Values" etc.
- foreign key
- a key of a relation which is a primary key of some other relation in the relational schema.
Database Design
- entity
- nothing but it represents field of the database e.g.,
employee
- entity set
- collection of different fields
- relationship
- relationship set
- domain
- The set of possible values for an attribute is called the
domain of the attribute e.g.: The domain of attribute marital status is having four values: single, married, divorced or widowed.
The domain of the attribute month is having twelve values ranging from January to December.
- Key attribute
- The attribute (or combination of attributes) that is
unique for every entity instance e.g.: the account number of an account, the employee id of an employee etc.
- attribute
- e.g., gender
- composite attribute
- degree of relationship
- the number of entity types involved in a
database
- cardinality of relationship
- The minimum and maximum values of this
connectivity is called the cardinality of the relationship e.g., one to one, one to many, many to many
- weak entity
- Entity that depends on other entity for its existence and
doesn’t have key attribute (s) of its own e.g. : spouse of employee
- identifying relationship
- non-identifying relationship
- discriminator
ER Diagram
edit- entity relationship data model
- Introduction to Data Modeling at University of Texas, Austin
Modelling concepts
- Specialisation
- Generalisation
- Categorisation
- Aggregation
Normalization
editNormalization is the formal process for deciding which attributes should
be grouped together in a relation.The process of normalization was
first developed by E.F.CODD. Normalization is the process of evaluating and correcting the relation schemes to minimize data redundancies and thereby reducing anomalies.
First Normal Form
editAs per the rule of the 1NF an attributes of a table cannot hold multiple value. It should hold only atomic value.
Second Normal Form
editSecond Normal form is characterized by the property of functional dependency. a relation is in second normal form if it is in 1st normal form and every non key attribute is fully and functionally dependent on the primary key.
Third Normal Form
edittransitive dependency : A relation is in third normal form , if it is in second normal form and
no transitive dependencies exist.
Suppose A,B and C are the three attributes of a relation(R) then if A->B(B depends on A) B->C(C depends on B) then we can say that "C" depends transitively on "A".
Boyce Codd Normal Form (BCNF)
editBCNF is based on functional dependencies that take into account all candidate keys in a relation. For a relation with only one candidate key, 3rd normal form and BCNF are equivalent. A relation is in BCNF if and only if every determinant is a candidate key. now what is a determinant? consider the following functional dependency: A→B where A and B are attributes in relation R. it says that B is functionally dependent on A. here A is referred to as determinant and B is the dependent.
BCNF is slightly stronger than 3nF.
Fourth Normal Form
edit- multivalued dependencies are removed
Fifth Normal Form
editAny remaining anomalies are removed.in this normal form we isolate semantically related multiple relationships.
Anomalies can be : Insertion Anomaly,Deletion Anomaly or Modification/update Anomaly. Any remaining anomalies are removed.in this normal form we isolate semantically related multiple relationships.
Query Languages
Query Languages
edit- query
- the retrieval of tuples from the relations of a relational schema
- query language
- a language used to retrieve information (tuples) from the relations of a relational schema.
types of query language
- procedural language
- non-procedural language
Mathematical query languages
- Relational algebra
- Tuple relational calculus
- Domain relational calculus
Relational algebra
editrelational algebra is a procedural query language
- operations
- selection (r σ s) -
- projection (r ∏ s) -
- union: (r ∪ s)
- set difference: (r – s)
- Cartesian product: (r X s)
- rename: ( ρr )
- intersection:
- natural join:
- division:
- union:
- intersection:
- outer join
- operations
BY:NA
Tuple relational calculus
editA tuple variable is variable that takes on tuples of a particular relation schema as values. That is, every value assigned to a given tuple variable has the same number and type of fields. A tuple relation calculus query has the form {T I p(t)} where T is a tuple variable and p(T) denotes a formula that describes T; The result of this query is the set of all tuples t for which the formula p(T)evaluates to true with T=t.
Domain relational calculus
edit
SQL
SQL stands for Structured Query Language. SQL is a non procedural language.
References
relational algebra
XML
Extensible Mark-up Language.
XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to carry data, not to display data XML tags are not predefined. You must define your own tags XML is designed to be self-descriptive XML is a W3C Recommendation
Application Design
JDBC
editJDBC means Java Database Connectivity.
Storage Manager
Storage manager module of the database provides the interface between the following component:
- data stored in the database
- the application programs
- queries submitted to the system
Components of a Storage Manager
editfile manager
editfunction of the file manager is to manage disk space for storage and manage data structure used for storing information.
buffer manager
editThe buffer manager reads disk pages into a main memory page as needed. The collection of main memory pages (called frames) used by the buffer manager for this purpose is called the buffer pool. This is just an array of Page objects. The buffer manager is used by (the code for) access methods, heap files, and relational operators to read / write /allocate / de-allocate pages. The Buffer Manager makes calls to the underlying DB class object, which actually performs these functions on disk pages.
Replacement policies for the buffer manager can be changed easily at compile time.
Authorisation and integrity manager
editChecks the authorization of users to access data and tests integrity constraints
transaction manager
editStorage access
editIndexing and hashing
editMany queries reference only a small proportion of records in a file. For example, finding all records at Perryridge branch only returns records where bname = ``Perryridge''. We should be able to locate these records directly, rather than having to read every record and check its branch-name. We then need extra file structuring.
Parsing and translation
editOptimization
editEvaluation
editAs we know that, computer has two storage memories 1.primary storage 2.secondary storage
Query Processing
- Query Processor
- Query parsing
- Query optimizer
- Query evaluation engine : It executes low level instructions given by the DML and retrieves the information from the storage manager.
Query Optimization
edit
Transaction
- transaction
- collection logically related operations in a database application.
- Transaction-management
- ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.
- authorization engine
Concurrency control
- Concurrency-control
- ensuring the consistency of the database in spite of interaction among the concurrent transactions
Recovery
To restore a physical backup of a data file or control file is to reconstruct it and make it available to the Oracle database server. To recover a restored data file is to update it by applying archived redo logs and online redo logs, that is, records of changes made to the database after the backup was taken. If you use RMAN, then you can also recover data files with incremental backups, which are backups of a data file that contain only blocks that changed after a previous incremental backup.
Distributed Database Management System
Distributed Database management Systems are the software for managing databases stored on multiple computers in the network.
Distributed Database: It is a set of databases stored on multiple computers that typically appears to application as a single database. It stores related over two or more physically independent sites that requires distributed processing. A database that is composed of different database fragments / parts.
Data Mining and Analysis
Data mining is searching information which is already stored in such a database.