Oracle and DB2, Comparison and Compatibility/Process Model/Optimizer

Overview edit

Speed first.

The way that users interact with their data in a database is through queries, and in multi user systems query volumes can be very high. Depending on the type of query, they can be very complex as well. These twin agents of volume and complexity mean that the efficiency with which the DBMS dispatches user requests is paramount. To achieve this, the Query Optimizer (known simply as the optimizer) analyzes user queries and figures out the best way to execute them. The way that the optimizer works is that it looks at each of the possible ways that a query can be broken up and implemented (the query plan), and then calculates the overhead (or cost) of each plan. The overhead cost considers such things as how much disk I/O will be incurred and the amount of CPU needed. This information is made available to the optimizer through the data dictionary that is stored in memory, and the information in the data dictionary is populated by statistics gathered from the database tables.

This method of optimization is called Cost Based optimization, and it is utilized by Oracle and DB2. There is another historical optimization method based on rules, called appropriately enough Rule Based Optimization. Rule Based Optimization is no longer used and is mentioned here only for completeness.

The optimizer works with index statistics in the data dictionary, so by default the efficiency of the optimizer is a function of the accuracy of the index statistics. While the optimizer uses these statistics to create each query plan, the rate of change of index statistics depends on the insert, update and delete activity on the table. While an individual tables’ volatility determines the frequency that index statistics need to be updated this information is not immediately available to the optimizer and updating the statistics requires that tables are scanned completely – not a trivial undertaking.

For large tables and large numbers of tables that are heavily modified, the statistics can quickly become out of date. In this case, it’s possible that the optimizer might not pick the best execution path. Additionally, unusual circumstances, such as small tables that have indexes (tables that can fit within a few blocks or pages) will trip up the optimizer. The optimizer uses index statistics, so if there are indexes on these small tables, the optimizer will use these even though a table scan would be more efficient. This is because the table can be accessed with a single I/O. To use the index on these tables, the index needs to be read (both Oracle and DB2 store indexes separately from table data), and then the page (or block needs to be read) – two expensive I/O operations. Depending on the extent size in either database, the entire table could be fetched with a single I/O. Because of this, both Oracle and DB2 allow you to override the decisions made by the optimizer.

Overall, it’s like tuning the engine in a car, you want to maximize performance and minimize fuel consumption – since there is an obvious trade-off between these two objectives, how you balance these trade-offs is very important. While it is impossible to achieve perfect balance between unlimited performance and zero resource consumption, the objective is to come close, and these trade-offs can have dramatic consequences - you can make decisions that cripple performance and hog resources. Tinkering with values can also become an exercise in exponentially diminishing returns and the general consensus is once you get it where you need it, nail it down and leave it alone.