Point in time backup and recovery is a crucial component of any part of any MySQL environment. This article describes how to implement the basic point in time recovery and describes a few mechanisms to accomplish this goal.
As all of you know, point in time backup and recovery is a crucial component of any part of any MySQL environment. From LVM (Logical Volume Manager) snapshots to MySQLdump to Xtrabackup there are many ways to accomplish point in time recovery. Some methods are more reliable and some are easier to work with than the next. In any case, all of them need to be correct at the end of a recovery. In this article, I will describe how to implement the basic point in time recovery and describe a few mechanisms to accomplish this goal, one is an older method and one is a newer method.
LVM is a great tool and it is easy to setup and work with. LVM has many benefits, for example:
- It can resize volume groups online by absorbing new physical volumes (PV)
- Resize logical volumes' (LV) online by concatenating extents onto LVs or truncating extents from them
- Stripe whole or parts of LVs across multiple PVs
- Mirror whole or parts of LVs
- Move online LVs between PVs
- Split or merge volume groups
For backup purposes, LVM comes with the ability to create read and read-write snapshots of logical volumes. Percona has a great article to get you started: Using LVM for MySQL Backup and Replication Setup . Below are the very basic steps to accomplish an LVM snapshot for a MySQL server.
1. You need to freeze your database modifications (FLUSH TABLES WITH READ LOCK)
2. Create LVM snapshot
3. FLUSH LOCK
4. If you are replicating you should save information about your replica (i.e. SHOW SLAVE STATUS)
5. Mount snapshot and save your datadir files
6. Unmount and remove snapshot
If you are so inclined to use LVM's snapshot capabilities you should look at mylvmbackup. In the past, LVM snapshots where arguably the best way to run a point in time backup, although today, there are other more flexible ways to accomplish the same goal. So, now that we have the gist of what LVM snapshots are all about, I would like to add that LVM could become cumbersome to work with and slow, especially on extremely large systems. Not to mention that in high write systems a flush table with read lock will probably take a long time and interrupt service to your customers in one way or another.
If you have read any of my previous articles you probably know where this article is going, yep, you guessed it, Xtrabackup. Xtrabackup is really a set of tools made up of, xtrabackup, innobackupex and tar4idb.
- Xtrabackup - is a complied C binary, which copies only InnoDB and XtraDB data.
- Innobackupex - is a wrapper script that provides functionality to backup a whole MySQL database instance with MyISAM, InnoDB and XtraDB tables.
- Tar4idb - tars InnoDB data safely
For the example below, I will be using innobackupex on a slave MySQL instance to get a point in time snapshot. Keep in mind that this is just one of the many configurations you could use for your installation. If you have not already, please check out my previous article, "Working with MySQL Multi-master Replication - Keeping a True Hot Standby," for the type of setup this backup system is implemented on.
For this set up, I am using MySQL 5.5.X rc with semi-synchronous replication. It is important to understand what semi-synchronous replication is before we get into this backup method. In MySQL 5.5, there is an interface to semi-synchronous replication in addition to the built-in asynchronous replication. Semi-synchronous replication is installed as a plug-in and can be used as an alternative to asynchronous replication. This type of replication works as follows:
1. A slave server, upon connecting to a master server, will inform the master that it is has semi-synchronous replication activated.
2. When semi-synchronous replication is enabled on both the master and at least one slave, the thread that performs a transaction commit on the master blocks after the commit. After the commit, the thread waits until the semi-synchronous slave acknowledges all events for the transaction on the master, or a timeout occurs.
3. The slave acknowledges receipt of a transaction's events only after the events have been written to its relay log and flushed to disk.
4. This step is a bit scary but in the event of a timeout on the slave, the master server reverts to traditional, asynchronous replication. This typically occurs when a slave gets too far behind, well, further behind than rpl_semi_sync_master_timeout. When the slave server catches up the master will return to semi-synchronous replication.
5. The semi-synchronous plug-in must be installed and active on both the master and at least one slave for this type of replication to work.
(Semisynchronous Replication, http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html)
Needless to say that this is both a cool and scary (specifically around the possible flapping between semi-synchronous and asynchronous replication) feature of MySQL 5.5. Semi-synchronous replication is a great idea but I am wondering at what performance or integrity cost(s). This is not the right article to determine either performance or integrity implications' so we'll assume, for now, that semi-synchronous replication is good enough based on what is stated in the MySQL documentation:
"Compared to asynchronous replication, semisynchronous replication provides improved data integrity. When a commit returns successfully, it is known that the data exists in at least two places (on the master and at least one slave). If the master commits but a crash occurs while the master is waiting for acknowledgment from a slave, it is possible that the transaction may not have reached any slave." (Oracle)
"Semisynchronous replication does have some performance impact because commits are slower due to the need to wait for slaves. This is the tradeoff for increased data integrity. The amount of slowdown is at least the TCP/IP roundtrip time to send the commit to the slave and wait for the acknowledgment of receipt by the slave. This means that semisynchronous replication works best for close servers communicating over fast networks, and worst for distant servers communicating over slow networks. " (Oracle)
The reason for the use of semi-synchronous replication in this example stems from years of hearing people state that MySQL replication cannot be trusted in any way. To that I often replied, "Then why even have a slave, and why are you running your backups on it, and why are you letting your customers read from it?" I digress so we'll stop the argument there.
Basically, we are running backups on our slave servers with Xtrabackup and want the best possible integrity we can muster short of DRBD! That said, lets move to the final part of this article that explains how you can use Xtrabackup on your slave server to achieve point in time backups.
Like any MySQL installation, especially when customers pay for service, it is a good idea to have consistent backups and, in this case, point in time backups. My actual point in time backup script will not be posted here but the general process will be. As you may have guessed my point in time backups are run on slave servers and I explicitly specify the --slave-info flag in innobackupex. The --slave-info flag is defined as follows, take from innobackupex --help:
"This option is useful when backing up a replication slave server. It prints the binary log position and name of the binary log file of the master server. It also writes this information to the 'ibbackup_slave_info' file as a 'CHANGE MASTER' command. A new slave for this master can be set up by starting a slave server on this backup and issuing a 'CHANGE MASTER' command with the binary log position saved in the 'ibbackup_slave_info' file."
shell> innobackupex --defaults-file=/etc/my.cnf --user=someuser --password=some_password --slave-info /path/to/backup
You have to make sure that the backup completed ok so make sure that your output file has the following line:
101101 20:55:36 innobackupex-x.x.x: completed OK!
In the output log, I look for the following lines and log them to a database just in case the log file was erased.
innobackupex-x.x.x: Backup created in directory '/path/to/backup' innobackupex-x.x.x: MySQL binlog position: filename 'binary-logs.000001', position 107 innobackupex-x.x.x: MySQL slave binlog position: master host '0.0.0.0', filename 'binary-logs.000001', position 61560909
Then I run the following inside of the MySQL server:
mysql> FLUSH BINARY LOGS;
Now you need to backup all of the old binary logs, preferably to a location off the MySQL cluster, like on a filer or any other JBOD server. You'll probably want to make sure that your backup and binary logs are in the same place on the JBOD!
As always, mileage may vary depending on what you are doing, your system capabilities and your service level agreements. Make sure you test the recovery process at least once a quarter after implementation!