Hyperce Knowledge Base

Hyperce Knowledge Base

Runbooks Database Failover Service Recovery

Runbooks

Service Recovery

Runbook for recovering a crashed or unresponsive service.

When to Use

Use this runbook when a service is unresponsive, crash-looping, or failing health checks.

Diagnostic Steps

Check service logs for error messages.
Verify resource usage (CPU, memory, disk).
Check if dependent services are healthy.
Review recent deployments or config changes.

Recovery Steps

Restart the service and monitor for recovery.
If restart fails, rollback to the last known good deployment.
If rollback fails, scale down to zero and investigate.
Check for resource exhaustion and increase limits if needed.
Once recovered, scale back up to normal capacity.

Escalation

If the service cannot be recovered within 30 minutes, escalate to the on-call lead and open a P0 incident.

Database Failover

Runbook for handling database failover scenarios.

Knowledge Base

General knowledge articles, architecture docs, and FAQs.

On this page

Diagnostic Steps