An Azure native disaster recovery service. Previously known as Microsoft Azure Hyper-V Recovery Manager.
For redundancy of both application and data in another Azure region, use geo-redundant platform services and design the app to fail over between regions.
Key elements:
- Data redundancy across regions
- Use Azure Storage with geo-redundant options:
- GRS (Geo-redundant storage): Data is synchronously replicated within the primary region using LRS, then asynchronously to a paired secondary region, where it is again stored with LRS. Provides at least 16 9s durability.
- GZRS (Geo-zone-redundant storage): Data is synchronously replicated across three or more availability zones in the primary region (ZRS) and asynchronously to a secondary region, stored with LRS there. Recommended for maximum consistency, durability, and availability.
- For read access in the secondary region before failover, use RA-GRS or RA-GZRS so the application can read from the secondary region during primary-region issues.
- Be aware replication is asynchronous, so there is a Recovery Point Objective (RPO): recent writes may be lost if the primary region is unrecoverable. Azure Storage Geo Priority Replication can keep RPO for Block Blobs ≤ 15 minutes.
- Use Azure Storage with geo-redundant options:
- File shares across regions
- For Azure Files (HDD SMB shares), use:
- GRS: Synchronous triple replication with LRS in the primary region, then asynchronous replication to a single location in the secondary region, again triple-replicated with LRS.
- GZRS: Synchronous replication across three availability zones in the primary region (ZRS), plus asynchronous replication to a single location in the secondary region, triple-replicated with LRS.
- GRS/GZRS for Azure Files provide at least 16 9s durability and protect against regional outages; failover uses system snapshots taken every 15 minutes, so the share state after failover is based on the latest replicated snapshot.
- For Azure Files (HDD SMB shares), use:
- Application design for multi-region
- Deploy the application in at least two regions (active/active or active/passive).
- Use geo-redundant data stores (as above) or database technologies that support geo-replication (for example, Azure SQL Database or Azure Cosmos DB with geo-replication) so the app in the secondary region can access up-to-date data.
- Use a global routing/failover mechanism (for example, Azure Traffic Manager or Azure Front Door) to direct user traffic to the healthy region during a regional fault.
- Design the app to handle eventual consistency when reading from secondary storage regions (for RA-GRS/RA-GZRS), including retry logic and user messaging if slightly stale data is acceptable.
- Failover behavior
- For Azure Storage accounts using GRS/GZRS:
- Under normal conditions, reads and writes go to the primary region.
- If the primary region becomes unavailable and cannot be recovered, initiate a storage account failover. DNS is updated so the secondary region becomes the new primary, and read/write access is restored there.
- For Azure Files with GRS/GZRS, after failover the file share state is based on the latest replicated system snapshot in the secondary region, which may be up to (or slightly older than) 15 minutes behind.
- For Azure Storage accounts using GRS/GZRS:
By combining multi-region application deployment with geo-redundant storage (GRS/GZRS/RA-GRS/RA-GZRS) and appropriate failover routing, both application and data remain available from another region if the first region experiences a fault.
References: