MySQL/Optimization
Before Starting To Optimise
editWhen the database seems to be "slow" first consider all of the following points as e.g. making a certain query absolutely unnecessary by simply using a more sophisticated algorithm in the application is always the most elegant way of optimising it :)
- Finding the bottleneck (CPU, memory, I/O, which queries)
- Optimising the application (remove unnecessary queries or cache PHP generated web pages)
- Optimising the queries (using indices, temporary tables or different ways of joining)
- Optimising the database server (cache sizes etc)
- Optimising the system (different filesystem types, swap space and kernel versions)
- Optimising the hardware (sometimes indeed the cheapest and fastest way)
To find those bottlenecks the following tools have been found to be helpful:
- vmstat
- to quickly monitor cpu, memory and I/O usage and decide which is the bottleneck
- top
- to check the current memory and cpu usage of mysqld as well as of the applications
- mytop
- to figure out which queries cause trouble
- mysql-admin (the GUI application, not to confuse with mysqladmin)
- to monitor and tune mysql in a very convenient way
- mysqlreport
- which output should be use as kind of step by step check list
Using these tools most applications can also be categorised very broadly using the following groups:
- I/O based and reading (blogs, news)
- I/O based and writing (web access tracker, accounting data collection)
- CPU based (complex content management systems, business apps)
Optimising the Tables
editUse the following command regularly to reorganize the disk space which reduces the table size without deleting any record[1]:
OPTIMIZE TABLE MyTable1
Moreover, when creating the tables, their smallest types are preferable. For example:
- if a number is always positive, choose an
unsigned
type to be able to store twice more into the same number of bytes. - to store the contemporaneous dates (from 1970 to 2038), it's better to take a
timestamp
on four bytes, than adatetime
on 8.[2]
Optimising the Queries
editComparing functions with BENCHMARK
editThe BENCHMARK() function can be used to compare the speed of MySQL functions or operators. For example:
mysql> SELECT BENCHMARK(100000000, CONCAT('a','b')); +---------------------------------------+ | BENCHMARK(100000000, CONCAT('a','b')) | +---------------------------------------+ | 0 | +---------------------------------------+ 1 row in set (21.30 sec)
However, this cannot be used to compare queries:
mysql> SELECT BENCHMARK(100, SELECT `id` FROM `lines`); ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT `id` FROM `lines`)' at line 1
As MySQL needs a fraction of a second just to parse the query and the system is probably busy doing other things, too, benchmarks with runtimes of less than 5-10s can be considered as totally meaningless and equally runtimes differences in that order of magnitude as pure chance.
Analysing functions with EXPLAIN
editWhen you precede a SELECT statement with the keyword EXPLAIN, MySQL explains how it would process the SELECT, providing information about how tables are joined and in which order. This allows to place some eventual hints in function.
Using and understanding EXPLAIN is essential when aiming for good performance therefore the relevant chapters of the official documentation are a mandatory reading!
A simple example
editThe join of two table that both do not have indices:
mysql> explain SELECT * FROM a left join b using (i) WHERE a.i < 2;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 4 | Using where |
| 1 | SIMPLE | b | ALL | NULL | NULL | NULL | NULL | 3 | |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
2 rows in set (0.01 sec)
Now the second table gets an index and the explain shows that MySQL now knows that only 2 of the 3 rows have to be used.
mysql> ALTER TABLE b ADD KEY(i);
Query OK, 3 rows affected (0.01 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> explain SELECT * FROM a left join b using (i) WHERE a.i < 2;
+----+-------------+-------+------+---------------+------+---------+----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+----------+------+-------------+
| 1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 4 | Using where |
| 1 | SIMPLE | b | ref | i | i | 5 | test.a.i | 2 | |
+----+-------------+-------+------+---------------+------+---------+----------+------+-------------+
2 rows in set (0.00 sec)
Now the first table also gets an index so that the WHERE condition can be improved and MySQL knows that only 1 row from the first table is relevant before even trying to search it in the data file.
mysql> ALTER TABLE a ADD KEY(i);
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> explain SELECT * FROM a left join b using (i) WHERE a.i < 2;
+----+-------------+-------+-------+---------------+------+---------+----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+----------+------+-------------+
| 1 | SIMPLE | a | range | i | i | 5 | NULL | 1 | Using where |
| 1 | SIMPLE | b | ref | i | i | 5 | test.a.i | 2 | |
+----+-------------+-------+-------+---------------+------+---------+----------+------+-------------+
2 rows in set (0.02 sec)
Optimising The MySQL Server
editStatus and server variables
editMySQL can be monitored and tuned by watching the status-variables and setting the server-variables which can both be global or per session. The status-variables can be monitored by SHOW [GLOBAL|SESSION] STATUS [LIKE '%foo%'] or mysqladmin [extended-]status. The server-variables can be set in the /etc/mysql/my.cnf file or via SET [GLOBAL|SESSION] VARIABLE foo := bar and be shown with mysqladmin variables or SHOW [GLOBAL|SESSION] VARIABLES [LIKE '%foo%'].
Generally status variables start with a capital letter and server variables with a lowercase one.
When dealing with the above mentioned per-session system variables it should always be considered that those have to be multiplied by max_connections to estimate the maximal memory consumption. Failing to do so can easily lead to server crashes at times of load peaks when more than usual clients connect to the server! A quick and dirty estimation can be made with the following formular:
min_memory_needed = global_buffers + (thread_buffers * max_connections)
global_buffers: key_buffer innodb_buffer_pool innodb_log_buffer innodb_additional_mem_pool net_buffer
thread_buffers: sort_buffer myisam_sort_buffer read_buffer join_buffer read_rnd_buffer
Note: Especially when dealing with server settings, all information should be verified in the respective chapters of the official documentation as these are subject of change and the authors of this text lack confirmed knowledge about how the server works internally.
Index / Indices
editIndices are a way to locate elements faster. This works for single elements as well as range of elements.
Experiment
editNote: when you make your time tests, make sure the query cache is disabled (query_cache_type=0
in my.cnf) to force recomputing your queries each time you type them instead of just taking the pre-computed results from the cache.
Let's run the following Perl program:
#!/usr/bin/perl
use strict;
print "DROP TABLE IF EXISTS weightin;\n";
print "CREATE TABLE weightin (
id INT PRIMARY KEY auto_increment,
line TINYINT,
date DATETIME,
weight FLOAT(8,3)
);\n";
# 2 millions records, interval = 100s
for (my $timestamp = 1000000000; $timestamp < 1200000000; $timestamp += 100) {
my $date = int($timestamp + rand(1000) - 500);
my $weight = rand(1000);
my $line = int(rand(3)) + 1;
print "INSERT INTO weightin (date, line, weight) VALUES (FROM_UNIXTIME($date), $line, $weight);\n";
}
What does it do? It simulate the data feeds from an industrial lines that weight stuff at regular intervals so we can compute the average material usage. Over time lots of records are piling up.
How to use it?
mysql> CREATE DATABASE industrial
$ perl generate_huge_db.pl | mysql industrial
real 6m21.042s
user 0m37.282s
sys 0m51.467s
We can check the number of elements with:
mysql> SELECT COUNT(*) FROM weightin;
+----------+
| count(*) |
+----------+
| 2000000 |
+----------+
1 row in set (0.00 sec)
The size must be important:
$ perl generate_huge_db.pl > import.sql
$ ls -lh import.sql
-rw-r--r-- 1 root root 189M jun 15 22:08 import.sql
$ ls -lh /var/lib/mysql/industrial/weightin.MYD
-rw-rw---- 1 mysql mysql 35M jun 15 22:17 /var/lib/mysql/industrial/weightin.MYD
$ time mysqldump industrial > dump.sql
real 0m9.599s
user 0m3.792s
sys 0m0.616s
$ ls -lh dump.sql
-rw-r--r-- 1 root root 79M jun 15 22:18 dump.sql
$ time mysqldump industrial | gzip > dump.sql.gz
real 0m17.339s
user 0m11.897s
sys 0m0.488s
$ ls -lh dump.sql.gz
-rw-r--r-- 1 root root 22M jun 15 22:19 dump.sql.gz
Incidentally restoring from the dump is way faster, because it uses extended inserts!
# time zcat dump.sql.gz | mysql industrial
real 0m31.772s
user 0m3.436s
sys 0m0.580s
This SQL command will scan all records to get a total sum:
mysql> SELECT SUM(*) FROM weightin;
Let's say we need to compute the total material used during January 1st 2008:
mysql> SELECT COUNT(*), SUM(poids) FROM pesee WHERE date >= '2008-01-01' AND date < '2008-01-02';
MySQL will also need to browse the entire database, even for this tiny number of records. This is because records can be anywhere: at the bottom, at the end, in the middle, nothing guarantees that the records are ordered.
To improve this, we can add an index to the 'date' field. This means MySQL will create a new hidden table with all the date sorted chronologically, and store their offset (position) in the 'weightin' table to retrieve the full record.
Because the index is sorted, it's way faster for MySQL to locate a single record (using a binary search algorithm) or even a range of data (find the first and last element, the range is in-between).
To add the index:
ALTER TABLE weightin ADD INDEX (date);
The index doesn't work if the query needs computer on the field (e.g. TIME(date)
) but works for ranges (e.g. WHERE date < '2008-01-02'
).
You can notice that the .MYD file grew:
$ ls -lh /var/lib/mysql/industrial/
-rw-rw---- 1 mysql mysql 49M jun 15 22:36 weightin.MYI
That's were MySQL stores the indices. Initially there was an index for the 'id' field, which the case for all primary keys.
Another example
editAnother example: let's say we want to optimise this query:
mysql> SELECT DISTINCT line FROM weightin;
We can do so by adding an index on the 'line' field, in order to group the doubles together, which will avoid the query to rescan the whole table to localize them:
ALTER TABLE weightin ADD INDEX (line);
The index file grew:
-rw-rw---- 1 mysql mysql 65M jun 15 22:38 weightin.MYI
General considerations
editThe first and foremost question that is always asked for SELECT queries is always if indices (aka "keys") are configured and if they are, whether or not they are actually be used by the database server.
- 1. Check if the indices are actually used
Individual queries can be checked with the "EXPLAIN" command. For the whole server the "Sort_%" variables should be monitored as they indicate how often MySQL had to browse through the whole data file because there was no usable index available.
- 2. Are the indices buffered
Keeping the indices in memory improves read performance a lot. The quotient of "Key_reads / Key_read_requests" tells how often MySQL actually accessed the index file on disk when it needed a key. Same goes for Key_writes, use mysqlreport to do the math for you here. If the percentage is too high, key_buffer_size for MyISAM and innodb_buffer_pool_size for InnoDB are the corresponding variables to tune.
The Key_blocks_% variables can be used to see how much of the configured key buffer is actually used. The unit is 1KB if not set otherwise in key_cache_block_size. As MySQL uses some blocks internally, key_blocks_unused has to be checked. To estimate how big the buffer should be, the sizes of the relevant .MYI files can be summed up. For InnoDB there is innodb_buffer_pool_size although in this case not only the indices but also the data gets buffered.
- 3. Further settings
sort_buffer_size (per-thread) is the memory that is used for ORDER BY and GROUP BY. myisam_sort_buffer_size is something completely different and should not be altered.
read_buffer_size (per-thread) is the size of memory chunks that are read from disk into memory at once when doing a full table scan as big tables do not fit into memory completely. This seldomly needs tuning.
Query cache
editThe main reason not to use any MySQL version below 4.0.1 if you have read-based applications is that beginning with that version, MySQL has the ability to store the result of SELECT queries until their tables are modified.
The Query Cache can be configured with the query_cache_% variables. Most important here are the global query_cache_size and query_cache_limit which prevents single queries with unusual big results larger than this size to use up the whole cache.
Note that the Query Cache blocks have a variable size whose minimum size is query_cache_min_res_unit, so after a complete cache flush the number of free blocks is ideally just one. A large value of Qcache_free_blocks just indicates a high fragmentation.
Worth monitoring are the following variables:
- Qcache_free_blocks
- If this value is high it indicates a high fragmentation which does not need to be a bad thing though.
- Qcache_not_cached
- If this value is high there are either much uncachable queries (e.g. because they use functions like now()) or the value for query_cache_limit is too low.
- Qcache_lowmem_prunes
- This is the number of old results that have been purged because the cache was full and not because their underlying tables have been modified. query_cache_size must be increased to lower this variable.
Examples:
An empty cache:
mysql> SHOW VARIABLES LIKE 'query_cache_type'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | query_cache_type | ON | +------------------+-------+ 1 row in set (0.00 sec) mysql> SHOW VARIABLES LIKE 'query_cache_size'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | query_cache_size | 0 | +------------------+-------+ 1 row in set (0.00 sec) mysql> SHOW STATUS LIKE 'Qcache%'; +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Qcache_free_blocks | 0 | | Qcache_free_memory | 0 | | Qcache_hits | 0 | | Qcache_inserts | 0 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 0 | | Qcache_queries_in_cache | 0 | | Qcache_total_blocks | 0 | +-------------------------+-------+ 8 rows in set (0.00 sec)
A used cache (savannah.gnu.org):
mysql> SHOW VARIABLES LIKE "query_cache_size"; +------------------+----------+ | Variable_name | Value | +------------------+----------+ | query_cache_size | 33554432 | +------------------+----------+ 1 row in set (0.00 sec)
mysql> SHOW STATUS LIKE "Qcache%"; +-------------------------+----------+ | Variable_name | Value | +-------------------------+----------+ | Qcache_free_blocks | 1409 | | Qcache_free_memory | 27629552 | | Qcache_hits | 7925191 | | Qcache_inserts | 3400435 | | Qcache_lowmem_prunes | 2946778 | | Qcache_not_cached | 71255 | | Qcache_queries_in_cache | 4546 | | Qcache_total_blocks | 10575 | +-------------------------+----------+ 8 rows in set (0.00 sec)
The matching my.cnf
configuration parameter is:
query_cache_size = 32M
To clear the cache (useful when testing a new query's efficiency):
mysql> RESET QUERY CACHE; Query OK, 0 rows affected (0.00 sec)
Waiting for locks
editThe Table_locks_% variables show the number of queries that had to wait because the tables they tried to access where currently locked by other queries. These situations can be caused by "LOCK TABLE" statements and also by e.g. simultaneous write accesses to the same table.
Table cache
editMySQL needs a certain time just to "open" a table and read its meta data like column names etc.
If many threads are trying to access the same table, it is opened multiple times.
To speed this up the meta data can be cached in the table_cache (alias table_open_cache since MySQL 5.1.3).
A good value for this setting is the number of max_connections multiplied with the number of usually used tables per SELECT.
Using mysqlreport or by looking at the currently Open_tables and ever since Opened_tables as well as the Uptime the number of necessary table opens per second can be calculated (consider the off-peak times like nights though).
Connections and threads
editFor every client connection (aka session) MySQL creates a separated thread under the main mysqld process. For big sites with several hundred new connections per second, creating the threads itself can consume a significant amount of time. To speed things up, idle threads can be cached after their client disconnected. As a rule of thumb not more than one thread per second should be newly created. Clients that send several queries to the server should use persistent connections like with PHPs mysql_pconnect() function.
This cache can be configured by thread_cache_size and monitored with the threads_% variables.
To avoid overloads MySQL blocks new connections if more than max_connections are currently in use. Start with max_used_connections and monitor the number of connection that were rejected in Aborted_clients and the ones that timed out in Aborted_connections. Forgotten disconnects from clients that use persistent connections can easily lead to a denial of service situation so be aware! Normally connections are closed after wait_timeout seconds of being idle.
Temporary tables
editIt is perfectly normal that MySQL creates temporary tables while sorting or grouping results. Those tables are either be held in memory or if too large be written to disk which is naturally much slower. The number of disk tables among the Created_tmp_% variables should be neglectible or else the settings in max_heap_table_size and tmp_table_size be reconsidered.
Delayed writes
editIn situations like writing webserver access log files to a database, with many subsequent INSERT queries for rather unimportant data into the same table, the performance can be improved by advising the server to cache the write requests a little while and then send a whole batch of data to disk.
Be aware though that all mentioned methods contradicts ACID compliance because INSERT queries are acknowledged with OK to the client before the data has actually be written to disk and thus can still get lost in case of an power outage or server crash. Additionally the side effects mentioned in the documentation often reads like a patient information leaflet of a modern medicament...
MyISAM tables can be given the DELAY_KEY_WRITE option using CREATE or ALTER TABLE. The drawback is that after a crash the table is automatically marked as corrupt and has to be checked/repaired which can take some time.
InnoDB can be told with innodb_flush_log_at_trx_commit to delay writing the data a bit. In case of a server crash the data itself is supposed to be still consistent, just the indices have to be rebuilt.
INSERT DELAYED works on main Storage Engines on a per query base.
Further reading
editUseful links regarding optimisation of MySQL servers:
- Various newsgroups and the MySQL mailing lists
- A guide to mysqlreport
- The book High Performance MySQL
- Tuning tips from the company EZ
- MySysop A php script for mysql optimisation and tuning, demo : MySysop