In one of the articles recently published on this forum, we have presented the point-in-time restore functionality incorporated into Azure SQL Database. Its primary benefit is the ability to recover from inadvertent database changes; however its applicability in the event of a disaster affecting an entire region is rather limited. In this article, we will turn our attention to solutions whose main purpose is to fill that gap, focusing in particular on the options that facilitate maintaining the operational status of an Azure SQL Database across different Azure regions.
Azure SQL Database offers different levels of site-level resiliency depending on your Recovery Time Objective (RTO) and Recovery Point Objective (RPO) requirements, exposed through three capabilities referred to as Geo-Restore, Standard Geo-Replication, and Active Geo-Replication. Their availability depends to some extent on the service pricing tier. More specifically, while Geo-Restore is supported on all tiers, Standard Geo-Replication requires at the minimum the higher-priced Standard level and Active Geo-Replication is limited to Premium only.
Let's review each of these three capabilities in more detail. We will identify the most common scenarios that each of them is best suited for and describe their implementation methods:
- Geo-Restore provides the ability to recover a database from its backup in situations where an incident affects the entire region where the server hosting the database resides. Similar to the point-in-time recovery, the restore utilizes backups residing in an Azure storage account; however, it is not limited to restores to the same server. Instead, it is able to take advantage of the geo-redundant characteristics of the storage account, which is configured as the RA-GRS type, replicating asynchronously to another region, with the secondary replica allowing for read-only access. On the other hand, even though incremental backups take place every five minutes following daily differential and weekly full (just as with point-in-time restore), there is an up-to-one hour delay before each of them is geo-replicated, which in turn dictates the resulting Recovery Point Objective. While Estimated Recovery Time is not supposed to exceed a twelve hour range, it is bound to vary depending on a number of different criteria, such as database size and performance level as well as resource contention (which is likely to occur following a regional disaster).
In order to carry out a Geo-Restore, you can use Azure PowerShell (with its Start-AzureSqlDatabaseRecovery cmdlet), REST API (by submitting Create Database Recovery Request), or Azure Classic Portal (via the BACKUPS tab of the server hosting the primary replica of the database). When using the portal during the restore process, you will have the option to create a new database on a new server residing in another Azure region (which hosts the geo-replicated storage account containing the backups).
This resiliency approach could be suitable for recovery from regional disasters assuming that your desired Recovery Time Objective matches or exceeds the Estimated Recovery Time. It might also come in handy during Disaster Recovery tests, since it has no negative impact on the production instance of the database.
- Standard Geo-Replication offers considerably enhanced business continuity capabilities comparing to Geo-Restore, with Estimated Recovery Time within the thirty second range and Recovery Point Objective smaller than five seconds, although as we pointed out earlier, it requires either the Standard or Premium pricing tier. This significant improvement is accomplished by maintaining a single, non-readable copy of the database in the secondary region, which you can fail over to in case the primary becomes unavailable. The secondary location is fixed (for a given primary location) based on the Disaster Recovery pairing rules enforced across global Azure infrastructure.
These characteristics make Standard Geo-Replication suitable for situations where tolerance for downtime and potential data loss are much lower than those offered by Geo-Restore. You can reap the resulting benefits during actual disaster recovery events. Testing this functionality involves invoking the failover process in the planned mode, which ensures full synchronization between the primary instance and its secondary replica before the switch takes effect (eliminating the possibility of data loss).
Enabling Standard Geo-Replication requires adding a secondary replica after the primary Standard or Premium pricing tier Azure SQL Database is deployed. Note that the secondary must reside in the paired region, which means that you need either an existing or a new server at that location. All necessary implementation steps can be accomplished via Azure PowerShell (by invoking the Start-AzureSqlDatabaseCopy cmdlet with the ContinuousCopy parameter), REST API (by initiating Start Database Copy), Azure Classic Portal (from the GEO-REPLICATION tab of the database) or Azure Portal (from the Geo-Replication feature in the Settings blade of the database).
- Active Geo-Replication offers the most advanced resiliency capabilities, with up to four read-only secondaries located in arbitrarily chosen regions (and, by implication, on different servers). While the Estimated Recovery Time and Recovery Point Objective values remain the same as with Standard Geo-Replication, you have the advantage of being able to fail over to any of the four locations. In addition, since each of the replicas operates in read-only mode, you not only can mitigate the impact of datacenter-wide disasters, but you also gain the ability to facilitate online application upgrades or migrations as well as offloading data analysis or reporting tasks from your production instance. However, as we pointed out earlier, all these benefits come at an extra price, since your database must be operating in the Premium pricing tier.
As with other recovery approaches, this one can be implemented by using Azure PowerShell (by invoking the Start-AzureSqlDatabaseCopy cmdlet with the ContinuousCopy parameter), REST API (by initiating Start Database Copy), Azure Classic Portal (from the GEO-REPLICATION tab of the database) or Azure Portal (from the Geo-Replication feature in the Settings blade of the database).
It is important to note that, regardless of the methodology, a failover in each of the scenarios outlined above will result in a change of the name of the server hosting the database. You will need to take this into account when designing your business continuity plan, since, as the result, you will need to update connection strings in database-bound applications accordingly. In addition, each recovery approach involves a manual step, primarily due to the fact that there is a potential for data loss (with the exception of a drill that can be carried out in the planned mode), which must be evaluated when deciding whether to invoke a disaster recovery. You should also consider how to handle maintaining the ability to recover from a disaster following a failover as well as determine the most efficient failback strategy. We will take a closer look at these considerations in one of the upcoming articles published on this forum.