MySQL is a great database for web-facing applications, however, it tends to be a bit rough around the edges. Enter Maatkit, a great toolkit with a bewildering array of command line tools that fill the gap where MySQL's native tools leave off. From data replication to query profiling and optimizing, Maatkit has tools to make you smarter, and help you get your job done.
MySQL is a great database for web-facing applications, and it's evolved to provide a surprisingly large set of sophisticated features well beyond what the original designers probably envisaged.
However, it tends to be a bit rough around the edges, and it takes some shortcuts, and fudges a bit all in the name of providing you the most bang for your virtual buck.
Enter Maatkit, a great toolkit with a bewildering array of command line tools that fill the gap where MySQL's native tools leave off. From Replication to query profiling and optimizing, Maatkit has tools to make you smarter, and help you get your job done.
Front and center is the table checksum tool. Why would you need to create checksums on your tables, you ask. Turns out MySQL's replication technology is deceptively easy to set up, and it may *seem* to be running properly, all the while your slave does not match your master. That's right, fire up this tool, and you'll will be surprised or even horrified to find your slave is not only different, but if you were performing backups off the slave, your backups are invalid as well!
The tool works by creating checksums of table contents at a point in time, then propagating those checksum tables through replication to the slave, where you can then run mk-table-checksum with different options to check for consistency.
This tool is like mk-table-checksum's assistant, allowing you to filter and sift through the output for the relevant information you want.
When you have found differences in tables between your master and slave database, you'll need a method for bringing them back in line. That's where mk-table-sync comes in. It allows you to bring over the missing rows or repair and update the changes, without having to completely rebuild your slave database.
MySQL's replication technology typically keeps up pretty well with the master, all things considered, but what if you want your slave to be behind the master? Perhaps you want to protect yourself from operator error, an accidental drop table, or drop database? With mk-slave-delay set to perhaps one hour, you can do this kind of thing. It basically disables the slave SQL thread for the specified period of time, and applies past transactions from the relay log based on the embedded timestamps there.
The nature of MySQL's statement based replication is that it serialized, which means all of the sessions that were operating concurrently on production, stand in line on the slave, in the order they were committed. In addition, the slave is only running inserts, updates, deletes, drops, creates, and so on. So it's cache may not always be as *warm* as the production cache. A warm cache means the blocks of data you're looking for don't have to be fetched from disk, because they're already in your cache, resulting in a logical I/O instead of a physical I/O. What that means to you and me is they're much much faster! mk-slave-prefetch provides a way to *warm up* your cache before the slave actually tries to run the relevant queries. This works well in parallel as it's only attempting to do the disk I/O before you actually need it, helping the slave perform and stay in closer sync with the master.
When you first learn about replication in MySQL you issue the "show slave status" command. You are probably already familiar with the "Seconds_Behind_Master:" field. Turns out the information here is not reliable. That's where mk-heartbeat comes in. It creates a heartbeat record on the master, and then that is propagated through replication. The check on the slave then looks at that record to see the actual slave lag. Nifty!
The tools mentioned previously talk about keeping a slave in sync when it might fail silently or otherwise get out of sync. This tool tries to keep the slave running after an actual error, by skipping the error. In general, it's a better idea to monitor why your slave is out of sync or getting errors, and fix them properly. You wouldn't want to build up a false sense of security when this tool is really just buying time. Nevertheless, in some cases it might be handy.
If you have a simple master-slave setup, you probably don't need a way of mapping your replication configuration visually. However, if you have a more complex setup, this tool might be useful. For instance master-master with slaves, or circular with active and passive master, or three masters in a circular configuration, or even a distribution master to take load off the real master, and support lots of slaves. All of these configurations could make use of this tool to give you insight on your setup.
If you've ever wanted to move a replication slave from one point in your topology to another, you'll find it is a very manual process, with plenty of potential for mistakes. This tool automates that process, if you need to do it.
This tool has a lot of different interesting use cases, and if you're doing server performance tuning, it will definitely help you. First, you can use it to monitor the processlist in MySQL dynamically, capturing data for a period of time. By probing the database in this way, it's able to do sub-second slow query capturing, which earlier versions of MySQL couldn't do (without a patch). When you finish capturing data, it will dump the results to standard out, providing you with a handy text-based graph of the activities of your database, and the queries it ran during that interval.
The tool can also be used in conjunction with tcpdump. This might be useful if you didn't have access to login to the MySQL server itself, or if you wanted to be very gentle on the server you're monitoring.
The tool can also be used to analyze a slow query log itself, so all-in-all a great tool!
The explain facility in MySQL is great for giving you insight on how the optimizer will get your data from the respective storage engines. This tool will format the output of explain as a tree diagram, which can be more understandable.
You can hand this tool a file with some SQL statements in it, and it will run them in batch, and provide information about what resources those queries used in aggregate. Did they open temporary tables, or do full table scans? This tool will tell you.
This works in conjunction with mk-query-profile to format the output in a more readable way, allowing you to view the results of multiple profiles side-by-side.
Maatkit is a tremendous toolkit for a MySQL DBA. It directly addresses day-to-day needs of DBAs, providing features you never knew you wanted, and some you sorely missed. In next month's article, we'll cover the rest of the Maatkit suite of command line tools - yep there's more, a lot more!
Maatkit Project Hosting on Google Code