Runbooks
Database Failover
Runbook for handling database failover scenarios.
When to Use
Use this runbook when the primary database becomes unresponsive or shows degraded performance that cannot be resolved by restarting the service.
Prerequisites
- Database admin access
- Access to the cloud provider console
- Familiarity with the replication setup
Steps
- Confirm the primary database is unhealthy via monitoring.
- Check replication lag on the replica.
- If lag is acceptable (< 5 seconds), initiate failover.
- Update the connection string in the application config.
- Restart affected services.
- Verify application connectivity to the new primary.
- Investigate the root cause of the original failure.
Post-Failover
- Monitor the new primary for stability.
- Rebuild the old primary as a new replica.
- Update documentation if the topology changed permanently.