Query optimization is the often overlooked part of applications. Development schedules being what they are, getting the right results and getting the application working are the main priorities. So thoroughly testing, and benchmarking queries is often left as an afterthought.
With our short introduction to query optimization in MySQL, we hope to encourage at least some attention to these issues up front. We'll also help you identify some of the more common optimizations you may run across.
1. Optimizing Data Types
There are a lot of different data types to choose from in MySQL, and sometimes it can be overwhelming. If you're a DBA you're looking at data types strictly in terms of optimizations, and that is your criteria for choosing. But developers may be looking at them in terms of elegant solutions, speed of coding, or simply the most obvious choice. So as a DBA you are likely to be faced with non-optimal choices here. When you can, try to influence these decisions using some general guidelines.
a. Try to avoid NULLs
Nulls are a special case in most databases. MySQL is no exception to this rule. They require more coding to handle internally, more checking, special index logic, and so on. Some developers simply aren't aware, and when NULL is the default, that is the way tables are created. However it's better to use NOT NULL in most cases, and even use a special value, such as 0 or -1 for your default value.
b. Use smaller fields where possible
When MySQL reads data from disk, it stores it in memory, and uses cpu cycles and disk I/O to read it. That means smaller data types, which take up less space will be read from disk and packed into memory more efficiently. That said; don't go so crazy with setting these data types small that you don't have room for unexpected changes in the application later. Alterations of the table will require restructuring, which is overhead, and potentially code changes, which is also a headache down the line. So strike a balance, but don't make fields larger than they need to be either.
2. Beware of Character Set Conversions
The character set that your client or application is using may be different from that of the table itself. This will require MySQL to implicitly convert it on the fly. In addition, certain character sets such as UTF8 support multi-byte characters, so they make storage requirements greater.
3. Optimizing COUNT (my_col) and COUNT (*)
If you're using MyISAM tables, count(*) with no where clause is very fast because the statistics on rowcounts is exact. So MySQL doesn't have to look at the table at all to get the count. The same can be said for count(my_col) if that column is NOT NULL.
If you are doing count() with a where clause, there's not much you can do to optimize it further, beyond the obvious of indexing the column in the where clause. It may be possible that a covering index will help you in this case, for a more complex where clause.
Short of the above suggestions, you might go with summary tables. These can allow you to keep up to date information about the contents of your table. You can use triggers, or application logic to keep the summary table always up to date, or you can run a batch job periodically to fill it with up to date information. If you do the latter, your information will be close, but not exact, depending on how often the batch job runs. Weigh your application's need for precise information against the overhead of keeping this data up to date, and strike a balance.
4. Optimizing Subqueries
MySQL's query optimization engine isn't always the most efficient when it comes to subqueries. That's why it is often a good idea to convert a subquery to a join. Joins have already been handled properly by the optimizer. Of course, be sure the column you're joining on in the second table is indexed. On the first table MySQL usually does a full table scan on against the subset of rows from the second table. This is part of the nested loops algorithm, which MySQL often engages to perform join operations.
5. Optimizing UNION
UNION has an interesting optimization that exists across a few different databases. It's obvious when you think about how it works. UNION gives you the rows from two tables that don't exist in the other. So implicitly, you are removing duplicates. To do this the MySQL database must return distinct rows, and thus must sort the data. Sorting, as we know is expensive, especially for large tables.
UNION ALL can very well be a big speedup for you. What if you already know that your data does not contain duplicates in either row, or what if you don't care about duplicates? In either case, UNION ALL is for you. Further, there may be other ways you can avoid the duplicates in your rows using some application logic, so you know that UNION ALL will provide the results you want, without the heavy overhead of sorting the data.
This list of query optimizations is by no means exhaustive, but should set you on the right track to optimizing the SQL in your application. Whenever possible, try a few different solutions, look at the query plans, test on large datasets, benchmark those results, and see what works in the real world. Also, be proactive by keeping an eye on your slow query log, and identify further queries that may need tuning as early as possible.
» See All Articles by Columnist Sean Hull