There are many ways you can replicate MySQL data, whether it be a simple two-node cluster, chain or master/master. This article explains the two types of data replication (Statement-Based Replication and Row-Based Replication) available in MySQL.
There are many ways you can replicate MySQL data whether it be a simple two-node cluster, chain or master/master. If you use MySQL you probably know about replication and might have experimented with the replication layouts' listed above. In this article I will explain not the layout of replication and why or why not to use a specific layout but the types of replication you can use. In MySQL, you can use two types of replication, Statement-Based Replication (SBR) and Row-Based Replication (RBR).
In MySQL, replication is basically the slave server reading binary logs on the master server then running the statements or applying blocks to the slave server. Depending on the type of replication you are using, events are recorded in different formats to the binary log. Below are the different formats explained:
- At first, replication was based on propagation of SQL statements from the master server to the slave server. This replication format is called statement-based replication (SBR) and is default in older versions of MySQL (<= 5.1.4). Just as a note that in later versions of MySQL, especially with the Innodb Plugin, you NEED to run your transaction_isolation with REPEATABLE-READ.
- The newer replication type is row-based replication (RBR), which logs changes in individual table rows to the binary log. Basically, logging the actual change and not the SQL statement itself.
- MySQL also has the ability to change its binary logging format in real time depending on the type of event using the mixed format binary logging. When the mixed format is turned on, statement-based replication is on by default but will change to row-based replication is particular cases. For more information on Mixed Binary Logging please see Mixed Binary Logging Format in the MySQL Documentation.
As you probably have gathered, all three types of logging have their own unique advantages and disadvantages. Many users are still on statement-based replication; however, the mixed replication format should be sufficient when it comes to performance and data integrity. Below I will list the advantages and disadvantages of the binary logging formats' so you will be able to make a better decision for your system.
Statement-Based Replication - Advantages
- It has been around from the beginning of MySQL version 3.23, although back then it was a lot more buggy!
- The binary logs do not grow as fast with statement-based replication as they do in row-based replication. This makes it faster to recover from a backup.
- The binary log files contain all write activity in SQL format making it easier to read with mysqlbinlog and audit your MySQL instance.
- Table definitions must be (nearly) identical on master and slave. This is a bit different than row-based replication where you have to deal with ordinal positions and the same data type characteristics per column.
Statement-Based Replication - Disadvantages
- Nondeterministic events that change data are a large issue with statement-based replication. It is very important to note that, in some cases the behavior of an event will not yield the same result on the slave server as it did on the master server. Below are some examples:
- When using a user defined function (UDF) or a stored routine that is nondeterministic, since the value returned by a UDF or stored routine or the value depends on factors other than the parameters supplied to it like the function USER().
- UPDATE and DELETE statements that use a LIMIT clause without an ORDER BY clause are considered nondeterministic. For example:
- UPDATE sometable set id = 41 where user like '%someuser%' limit 1;
- Note that there could be many rows matched in the where clause but there is no ORDER BY clause.
o Statements using any of the following functions cannot be replicated properly using statement-based replication:
- UUID(), UUID_SHORT()
- SYSDATE() (unless both the master and the slave are started with the --sysdate-is-now option)
NOTE: All other functions are replicated correctly using statement-based replication, including RAND(), NOW(), etc. For more information, see Replication and System Functions.
- Statements that cannot be replicated correctly using statement-based replication are logged with a warning like the one shown here:
[Warning] Statement is not safe to log in statement format.
- You can view a similar warning on the client side by running SHOW WARNINGS.
- Both insert and select queries require a larger number of row-level locks compared to row-based replication.
- A table scan caused by an update statement where the update is not using an index in the where clause will lock a larger number of rows.
- For InnoDB: An INSERT statement that uses AUTO_INCREMENT blocks other non-conflicting INSERT statements.
- All statement must be evaluated and executed on the slave before the rows are updated or inserted. This can have considerable performance issues on the slave staying up to date. If there is an error in evaluation on the slave, statement-based replication may have degraded integrity over time.
- Stored functions execute with the same NOW() value as the calling statement.
- Deterministic UDFs must be applied on the slaves.
Here are some important notes about SBR, RBR and the MySQL version you may be running.
- For versions earlier than 5.1.14 DDL statements like CREATE are replicated using SBR, while DML statements along with GRANT and REVOKE are replicated using RBR.
- In versions including 5.1.14 and higher the mysql database is considered a node-specific database.
- Any manipulation of the mysql database tables, such as GRANT, REVOKE and trigger manipulation are all replicated using SBR.
Row-Based Replication - Advantages
- This is considered the safest way to replicate data in MySQL because ALL changes can be replicated.
- SBR is used for statements like CREATE TABLE while the row insertions and updates use RBR.
- RBR has been around for a while, that said, if you have used a similar RDBMS with RBR you should be able to use MySQL RBR with ease.
- Fewer row locks are needed on the master, for the following types of statements:
- INSERT statements with AUTO_INCREMENT
- UPDATE of DELETE statements that don't require a table scan or change most of the examined rows
- Fewer row locks are needed on the slave for INSERT, UPDATE and DELETE statements.
Row-Based Replication - Disadvantages
- The table schema must be the same from master to slave when using row-based replication.
- You cannot easily examine the binary log with RBR. Instead, beginning with MySQL 5.1.29, you can see what data was changed using mysqlbinlog with the options--base64-output=DECODE-ROWS and--verbose.
- If a statement changes many rows, let's say with an UPDATE, row-based replication writes more data to the binary log even for statements that are rolled back. Running point-in-time snapshots can take more time as well. Concurrency problems may come into play given the lock times needed to write large chunks of data into the binary log.
- Deterministic UDFs that generate large BLOB values take longer to replicate with row-based replication because the actual value is replicated, not the SQL statement.
- You can get different results when running SELECT statements on the slave and master servers while running bulk inserts.
As stated earlier in this article, the MIXED format should provide you with the most data integrity and performance. However, if you do choose to set your binlog_format to STATEMENT or ROW you should test the heck out of it before you implement into production. I currently use both MIXED and STATEMENT-based replication in production depending on the application I'm working with.