Oracle and DB2, Comparison and Compatibility/Architecture/Oracle

Oracle edit

Oracle Storage Model edit

The primary structures that make up the physical database on disk are:


• Data Files

• Control Files

• Undo Logs

• Temporary Files

Oracle Database Disk Structures
 


Data Files hold Data, Indexes, Undo Segments and Temporary data. Data and Indexes can be stored in separate files, or together in the same data file.

Control Files contain the structure of the database. The control file is a binary file and there should be at least 3 for every database you have (for redundancy purposes). The control files look exactly alike, and they should be on separate drives and controllers. The contents of the control files are such things as the database name, create date and the canonical path of all the data files and checkpoint information for each database. They are used to mount the database, open it and access it. They store synchronization data in the event that the database needs to be recovered.

The Undo / Redo logs have 3 major components.


• All the DML (Insert, Select, Update and Delete). All the changes made to Indexes and Tables get stored here.

• All the DDL (i.e. Create, Drop, Alter)

• All the commits


The online undo logs are a record of all the changes made to the database. Because these files are so important in maintaining the integrity of the database in the event of a recovery, each undo logfile is duplicated, and information is written to both members (multiplexing). The log files are then organized in groups, so that as one set of logfiles fills up, Oracle does a checkpoint and log switch, after which, the next set of files in the group are written to. There will be at least two sets of logfile groups for a database, but there will probably be more (up to a maximum of 255). The logfiles are written to cyclically, when one group of logfiles is filled, the next group in the series is written to. These files should be on separate drives and controllers (again, for redundancy). When the last group in a series is written to, the process begins again with the first group in the series. An additional level of security can be obtained by archiving the log files as they fill up, so that when they are written to again, no data is overwritten – this is called running in archive log (ARCHIVELOG) mode. You need to ensure that there is enough space to hold the archived log files since the database will hang if it tries to archive to a directory that is full.

The 3 structures mentioned above, redo / undo log files, control files and data files, plus temporary files created during database operations collectively define the DATABASE.

Outside of the database itself is the Server Parameter file (SP file). This is a binary file that is maintained with the ALTER SYSTEM command. There are about 300 parameters in this file. There is also the Oracle Password File that is maintained by ORAPWD.

Oracle Memory Model edit

Oracle manages memory in three distinct chunks. If we think of memory being organized from the ‘top down’ at the top is memory allocated for the database instance and background processes– the Shared Global Area (SGA). The SGA is a collection of shared memory structures that contain data and control information for one Oracle Database instance. Each database instance has one SGA where buffers are allocated when the instance starts. SGA should fit into real memory to avoid memory paging. It is allocated when the DBMS is started (startup), and is freed when it is stopped (shutdown). It is the SGA and the processes that operate against these structures that are called the DATABASE INSTANCE.

Below this, there is memory allocated for each server and background process, the Program Global Area (PGA). Each PGA is allocated in non-shared memory when a server process starts, and it contains data and control information for a server process. This means that access to the PGA is exclusive to that server process. When the term Instance PGA is used, it refers to all of the individual memory segments in use, even though they are not shareable.

For each piece of code, either Oracle Database code or User Code, there is a Software Code Area (SCA). Oracle Database code and User Code are stored in separate memory locations, the Oracle code is isolated in a more secure location.

So, from the top down we have:


• The Shared Global Area (one per DBMS instance)

• The Program Global Areas (one per server and background process)

• The Software Code Areas (for each Oracle or User program)

Main Areas of Oracle Memory Model
 


Oracle System Global Area edit

Oracle System Global Area
 


The main memory pools in the system Global Area are:


• The Buffer Cache

• The Shared Pool

• The Redo Buffer

• The Large Pool

• The Java Pool

• The Streams Pool


Buffer Cache edit

The buffer cache is visible to all users connected to the Oracle Instance. The database buffer cache contains four major segments – Data, Indexes (and data with indexes), Undo Logs and Temporary files. Temporary files are memory structures that are built to hold transient data, such as the intermediate results of sorts. Because the buffer pool holds blocks of data read from disk, it is most efficient if the buffer pool block size is a multiple of the operating system block size. It is important that the database buffer pool be also large enough for applications that perform a lot of disk I/O, such as Online Transaction Processing (OLTP) applications.

The way Oracle uses the buffer cache is that when it needs a piece of data, the first place it looks is in the buffer cache, since a) memory reads are quicker than disk reads, and b) Oracle strives to keep the most frequently used, and the most likely to be used data in memory. If the data does not exist in the buffer cache, then Oracle will read the required data from stable storage (disk). After reading the data, it is placed into the buffer cache so that any subsequent access to the same data will take advantage of the cached data.

The buffer cache is organized into two areas, the Write List and the Least Recently Used (LRU) list. The write list holds modified blocks that are waiting to be written to disk. Since disk I/O is expensive, blocks are not flushed to disk as soon as they are modified, but are collected in the write list as candidates for a single, less expensive write that can move several blocks at once (a delayed write).

The LRU list holds empty buffers (free buffers), buffers in use (so called ‘pinned buffers’ because they are pinned in memory to prevent them being evicted during use), and dirty (modified) buffers waiting to be moved to the write list. When a process accesses a buffer, the buffer is moved to the end of the LRU list called, appropriately enough, the Most Recently Used end. This process of migrating buffers through the LRU list as they are accessed means that those that are infrequently accessed move to the least recently used end and become candidates for eviction from the cache by the LRU algorithm. Since it is desirable to keep the most frequently accessed data available in shared memory (i.e. data, indexes, SQL statements and query plans), there is a situation that can arise when there are sets of data that are being continually accessed that will be pinned in memory and not written to disk. To make sure that this data is written to disk (for permanence) and saved in the undo logs, there is a checkpoint process (CKPT) that runs periodically to do just that by signaling the database writer process(es) DBW0, DBW1, etc... (for efficiency, there can be more than one database writer as we will see later).


Having seen how the buffers are organized, the way that processes use the buffer cache is that they search the buffer cache for data first. When the required data is found in the buffer cache this is called a ‘cache hit’. On a ‘cache miss’ the required data must be read from disk. On a cache miss, Oracle will search for free buffers in the LRU list starting from the least used end, and move any dirty buffers it finds to the write list as it goes. The process will search until it finds a free buffer or gets to the MRU end of the list. If the process finds a free buffer, it will read the data block it needs from disk into the buffer and then move the buffer to the MRU end of the list. If there are no free buffers available, the process signals a database writer process to write modified buffers back to disk.

The Shared Pool edit

The shared pool is a major component of the SGA, it contains many structures and the main structures are identified below. An example of these structures are the areas used to hold parsed SQL statements. These statements are loaded and unloaded into memory regularly and hence require memory allocation and de-allocation. To manage Shared Pool memory an LRU algorithm similar to the one in the buffer pool is used.

This algorithm maintains frequently accessed items in the shared pool even after the process that created them terminates. Similar to the buffer pool LRU algorithm, the database server will examine the shared pool for a pre-existing parsed instance of any SQL statement that is submitted for execution. If one is found, it will be used. If not, a new SQL area is allocated in the shared pool and once the SQL has been parsed and an execution plan built, this will be used. In the event that database objects that are referenced by structures in the SQL areas are modified, the SQL areas are invalidated, and this results in the statement being re-parsed and loaded next time it is run.

The shared pool consists of:


• The Library Cache

• The Result Cache

• The Dictionary Cache


The Library Cache edit

The library cache is an instance level memory structure available to all concurrent users. It holds shared and private SQL areas, PL/SQL procedures and packages.

Shared SQL Areas and Private SQL Areas edit

Shared SQL Areas are used for user process SQL statements, and by definition many user processes can share this code. There are instances, such as shared server configuration (mentioned below) and for blocks of PL/SQL code where information about the executing SQL unit needs to be isolated to prevent concurrent users stepping on each others package variables, and these units use a Private SQL Area. There is a shared SQL area and a private SQL area for each SQL statement running in the instance. The SQL Areas contain an SQL Statements’ parse tree and execution plan. The database server allocates memory each time an SQL statement is parsed and this allocation is dynamic, based on the size and complexity of the SQL statement. If an SQL area is de-allocated, then the SQL statement will be re-parse when it is executed again. In this way the database can reduce the parsing overhead for SQL which is frequently executed. This is possible since SQL tends to change very infrequently. Since a single Shared SQL statement is usable by all instance users this also saves memory.

In a shared server configuration, many users processes can share a smaller number of server processes, effectively increasing the number of concurrent users for the instance. The way this is achieved is that the user processes are connected to a dispatcher, which brokers user process requests among a shared pool of idle server processes through a common queue. In this case though, each user must have a separate copy of the statement's private SQL area.

PL/SQL Program Units and the Shared Pool edit

PL/SQL statements (functions, triggers, procedures, packages, and anonymous blocks) are processes similarly to individual SQL statements. A Shared SQL Area holds the parsed statements, and a Private SQL Area holds information specific to the user process executing the PL/SQL block, such as local and global variables, package variables, and SQL buffers. In this way, each user running a PL/SQL program unit will utilize the same Shared SQL Area, but retain values specific to its session in a Private SQL Area.

The Result Cache edit

The result cache is composed of two areas; the SQL query result cache and PL/SQL function result cache.

SQL Query Result Cache and the PL/SQL Function Result Cache

These areas are used to hold the results of executed SQL statements and PL/SQL Functions for subsequent re-use. Their use can be controlled by a parameter setting that turns this function on, or by hints in your SQL. If the results cache(s) reference an object that is changed, the database invalidates the results set so that outdated information is not returned for subsequent queries or functions – they are run again to refresh the cache.

Dictionary Cache edit

The data dictionary is a collection of database tables and views containing reference information about the database, its structures, and its users. Oracle Database accesses the data dictionary frequently during SQL statement parsing. This access is essential to the continuing operation of Oracle Database. The data dictionary is accessed so often by Oracle Database that two special locations in memory are designated to hold dictionary data. One area is called the data dictionary cache, also known as the row cache because it holds data as rows instead of buffers (which hold entire blocks of data). The other area in memory to hold dictionary data is the library cache. All Oracle Database user processes share these two caches for access to data dictionary information.

The Redo Buffer edit

The redo log buffer stores information about changes made to the database by INSERT, UPDATE, DELETE, CREATE, ALTER, or DROP operations as redo entries. Entries are made into the redo log buffer from copies of the user's memory. These entries are used to reconstruct the database in the event of a recovery. Because the entries are important for maintaining the consistency of the database, they are also written to the active redo log group on disk by the log writer (LGWR) process. Since the redo buffer is written to disk, the redo log buffer is cyclically overwritten – when a process gets to the end of the redo log buffer, it continues writing again at the start. Because of this, the redo log buffer is allocated as a continuous space in memory.

The Large Pool edit

The Large Pool is an optional area that can be allocated in the SGA for processes and operations that require large amounts of memory. For example, in the shared server configuration mentioned earlier where transactions cross multiple databases, the large pool can be used to hold shared SQL for these transactions instead of the Shared SQL Area. The Large Pool is better suited for such things as periodic database maintenance utilities that require large amounts of memory (i.e. backup and restore), server I/O processes and buffers for parallel operations (where queries are broken into separate units and executed simultaneously.) In these cases, the large pool prevents these memory heavy processes from impacting the shared pool.

The Java Pool edit

The Java Pool holds Java Code and Data for the Java Virtual Machine – essentially the user written Java programs and data

The Streams Pool edit

Oracle Streams is a built-in database replication feature used for the propagation of data, transactions and events in a data stream between databases or within a single database. Streams works by ‘capturing’ information, queuing it as a message and then ‘applying’ this information elsewhere. Streams is used in replication for capturing database changes and then applying them elsewhere, but streams can be used to propagate any other information in the database. The Streams pool holds buffered queue messages, and allocates memory for the Streams capture and apply processes.

Oracle Program Global Area edit

Oracle Program Global Area
 


The Program Global Area (PGA) is memory allocated for each server process and consists of two areas, Session Memory and the Private SQL area. The contents of the PGA vary, depending on where Shared Server is running. The Persistent Area, which is used to store information about bind variables is held in the SGA for Shared Server operation. For Dedicated Server operation, the persistent area is held in the PGA.

Session Memory is where session related variables (such as logon information) are held. Session memory is private if the server is a dedicated server, if servers are shared session memory is also shared.

The Private SQL Area

There is a usage overlap between the SGA and PGA for Private SQL Areas. The Private SQL Area operates the same way in the PGA as it does in the SGA, the only difference being in how the server is configured. If the server is dedicated, the private SQL area is held in the PGA, if the server is shared it is held in the SGA.

Each session that issues an SQL statement has its own private SQL area which contains bind variables, query execution state variables and query execution work areas. If several users submit the same SQL statement, they all use a single shared SQL area. There is a 1:n relationship between a shared SQL area and private SQL area(s).

In the private SQL area has 3 separate areas, one for cursor and SQL, the private SQL area itself and SQL work areas.


• The Cursor and SQL Areas

• The Private SQL Area Components

• SQL Work Areas


Cursors and SQL Areas edit

Cursors are a powerful construct for manipulating multiple rows of fetched data. The cursor holds the rows and allows navigation through them either serially or hierarchically. There is information that needs to be stored about the cursor, such as where you are in a block of fetched rows, and Oracle allows you to name cursors and use this reference during program execution. Likewise handles to specific parts of the private SQL area can be used as a named resource. The memory structure that holds cursor and SQL Area named resources is the Cursor and SQL Area.

The individual user process are responsible for managing private SQL. Private SQL area persist in memory until the SQL statement handle is freed or the cursor is closed, at this time memory allocated to the runtime area is released, but the persistent area stays in memory. The persistent area contains query execution state information (i.e. for keeping track of the progress of a table scan), and SQL work areas.

SQL work areas are used for memory-intensive operations such as:

• Sort-based operators (order by, group-by, rollup, window function)

• Hash-join

• Bitmap merge

• Bitmap create

A sort will use a work area (called a sort area) to perform an in-memory A hash-join uses a work area (called the hash area) to build a hash table. If the amount of data to be processed by these two operators does not fit into a work area, it will be split and the overflow will be written to temporary storage on disk. As a general rule bigger SQL work areas can improve performance but will consume more memory, smaller work areas run the risk of overflowing onto disk and hammering performance. Because of this the size of a work area can be controlled, and the database will do this automatically for you if automatic PGA memory management is enabled.

Oracle Software Code Area edit

Software code areas are where Oracle Database code (the code used to run the actual database) and User Code (user written application code) are stored. These areas are read only and can be either shared or non-shared. Since the code itself does not normally change during the course of time, these areas are usually static and only change when software is updated. Oracle database specific code is typically stored in a more exclusive and protected part of memory than user code. Only Oracle gets to change their code, but users can update their own code as they wish. As far as possible Oracle strives to implement shared code for performance reasons, but it is not possible to implement the whole database code set this way, so some of it is necessarily non-shared, however multiple Oracle instances can share tha same Oracle code area if it is running on the same machine (as noted in Figure 2. 3 main areas of Oracle memory model)

Other structures in memory are System Views which are virtual tables of useful information, such as dba_datafiles that will give you information about the physical data files and how they are mapped to tablespaces. There are also dynamic performance views (prefaced by V$.) that you can write queries against (as well as the dba_datafiles) to see structures in the shared pool. V$_SQLAREA and V$_SQL will give you a dynamic view into the SGA, and V$_Controlfile gives the full pathname of control files

Oracle Process Model edit

Main Oracle Processes
 


Legend:

• D000 - Dispatcher

• DBWn - Database Writer (there may be several, designated DBW1 → DBW9 and DBWa → DBWj)

• LGWR – Log Writer Process

• CKPT – Checkpoint Process

• ARCn – Archiver Process (there may be several, designated ARC0 → ARCn)

• RECO – Recover Process

• PMON – Process Monitor

• SMON – System Monitor

Oracle uses processes to run its core database components and the user programs and tools that work against the database and its data. There are two processes that are run for each connected user, the user process (applications and database tools and utilities) and the database processes. Oracle can be run as a dedicated server, where each user process has a dedicated server process that services it, or as a shared server, where multiple user processes are connected to a dispatcher, and shared server processes service user process requests using request and response queues. In each case, multiple users can work with the database at the same time, but in dedicated mode, each user process has a dedicated server process. In multi-process mode, multiple oracle server processes run the Oracle database code.


User Processes

A database with no connected users is a thing of academic interest only, the reason for having the database is for it to service user requests, so starting from the perspective of a user, we will look at how Oracle handles concurrent users. As mentioned previously, Oracle can be configured to run as either a dedicated server, or a shared server. Shared User Processes and Dedicated User Processe perform the same function, the primary difference is how the user processes are handled. The Oracle database processes that run the database remain the same (for example, regardless of how user processes are connected to the database, the same database processes, such as log writing and checkpointing are running the actual database itself). Depending on configuration, there can be multiple Oracle processes running to perform database operations (such as writing data from the buffer cache to disk), but again, these are multiple instances of the same process, they are not different processes in each case.

Starting from the users perspective, their processes are connected to an oracle instance. If the user is on the same physical server, the connection is handled by Inter-Process Communication (IPC). If user is on a remote machine, the connection is handled by network software (such as TCP/IP). A connection is stateless, it can be thought to be a pipe between the user process and the database instance. When the user starts to work across this connection they establish a session, and the session is stateful. If a user uses a client program to administer the database, the session will start with a logon and will continue until logoff or the user disconnects from the database. Connections and sessions are handles by processes, and as such they both consume resources Connections handle the connection of users to the database, and their operation is concerned with the maintenance of this pathway, whether or not anything is going on. Sessions are what are used by user processes to perform work, and the maintenance and operation of session processes need not consider the connection (that is handled by the connection process), session processes make sure that the state of the session is maintained. A single user can have many sessions running against a database instance – the same user may be logged in several times and doing different things across several sessions.

To support a shared server connection the following processes are used: - A network listener will connect the user process to a dispatcher or dedicated server process - A dispatcher will direct user sessions to a shared server pool of processes - Shared Server processes At least one dispatcher process must be running for each network protocol the database supports. The listener will determine if the user process can use a shared user process and will direct the process to a dispatcher. If the user process cannot use a shared connection, the listener will create a dedicated server process and connect the user process to that. Shared Server connection is a scalability feature in the Oracle database, and it does not require any change to the user applications connecting to the instance.

If a user process uses a dispatcher, the dispatcher will place the user requests on a request queue for servicing by the next available server process. This queue exists in the SGA and is serviced by a single shared user process on a ‘first in, first out basis’. When the server process finishes, it places a response in the response queue for the dispatcher handling this request and the dispatcher returns this to its associated user process. Both the request queue and the response queue are dedicated to a dispatcher instance and are memory structures in the Shared Global Area (SGA).


Oracle Processes

Oracle uses two categories of process to run the database server, server processes (that handle user requests) and background processes (to run the database itself).

Server Processes running on behalf of the users applications are used to parse and run SQL statements, and retrieve data from disk into the SGA.

Working in concert with the user process are a number of background processes. There is a large number of background processes that can be running in an Oracle database instance, and not all of them are required. The following is an overview of the main processes concerned with the Oracle instance as identified in Figure n. Main Oracle Processes.


• Database Writer Process (DBWn)

The database writer takes data from the database buffer cache and writes to database data files in the background. There can be several of these background writer tasks running (DBW1 through DBW9, and DBWa through DBWj). Normally one writer is sufficient, but in systems that heavily modify data, you can configure the number of writers that will be started through the parameter file. If no value is set for the number of writers, Oracl determines how many writers will be started for you based on the umber of CPU’s available. The database writers perform asynchronous writes for performance. Writes are also deferred. Data is not written from buffers when data changed with an Insert, Update or Delete is Committed. This keeps frequently access data from being ejected from the buffer cache and also allows the writer to write larger pieces of data to disk infrequently as opposed to small changes as soon as they occur. This is the so-called ‘batched multi block’ write. If a set of blocks are needed that are not in the database buffer pool writes modified buffers that have not been accessed recently (called cold dirty buffers) from memory to disk to free up space. This frees up cold clean buffers for data read from disk into memory. The LRU algorithm keeps the most recently accessed buffers in memory to maximize the potential of a cache hit.


• Log Writer Process (LGWR)

The Log Writer is responsible for taking information from the undo log buffer pool and writing it to the current group of online undo logs. The log writer will write a redo log entry commit record when transactions commit, and writes the commit and all changes associated with that commit to disk. When redo log entries are written to disk, only those entries added to the buffer since the last LGWR write are written to disk (so that duplicates are not recorded to disk). LGWR writes to disk every three seconds and when an undo log becomes one third full. LGWR will also write to disk if the database writer process(es) DBWn write dirty buffers to disk. Before DBWn writes to disk, all the redo entries associated with the changed data are written by LGWR.


• Checkpoint Process (CKPT)

Frequently updated blocks have the possibility of staying in memory for a long time, they never get picked up by the LRU. The checkpoint process will signal a background write (DBWn) of updated committed blocks to on disk data files. You can schedule the frequency of checkpoint processes (via the Server Parameter (SP) file), or can force a checkpoint.


• System Monitor Process (SMON)

The system monitor communicates with all background processes. The system monitor manages the database and does instance and crash recovery in the event of a crash. It does other housekeeping such as recovering space in unused temporary segments, and merging free space in the data files.


• Process Monitor Process (PMON)

PMON –The Process Monitor monitors the other processes and will handle deadlocks and clean up after aborted processes. The process monitor stores information with the network listener about the instance and dispatcher processes. It checks the status of any dispatcher and shared server processes and will restart any that have stopped.

Both PMON and SMON check themselves regularly to see if they are needed, and each can be called by other processes if required.


• Recovery Process (RECO)

The recovery process is only present on instances with the distributed database configuration. RECO resolves distributed transaction failures. If the status of a distributed transaction is ‘in-doubt’, the nodes’ recovery process automatically connects to other databases involved in that transaction. RECO then resolves all in-doubt transactions by removing any rows associated with the resolution of an in-doubt transaction from each database's pending transaction table. If RECO cannot connect to a remote server it will time out and retry later.


• Archiver Processes (ARCn)

The archiver process (ARCn) copies redo log files to offline storage on each log switch. The archiver process(es) only exist on instances that have log archiving turned on. There can be upto 10 ARCn processes running on an instance, and these processes are kicked off by LGWR depending on system load.


• Dispatcher Processes (Dnnn)

There can be a number of dispatcher processes database instance. There will be at least one per supported network protocol (i.e. TCP/IP). Because the dispatcher is used to handle multiple user processes, the DBA can start a number of dispatchers initially, and increase or decrease this number as needed while the database is running


• Shared Server Processes (Snnn)

Each shared server process serves multiple client requests in the shared server configuration.

The Program Global Area of a shared server process contains no user data since it needs to be accessible by all shared server processes, it contains only stack space and process-specific variables. The session related data (and hence user data) is stored in the SGA. Oracle will dynamically adjust the number of shared server processes based on the length of the request queue.