You support a production service that runs on a single Compute Engine instance. You regularly need to spend time on recreating the service by deleting the crashing instance and creating a new instance based on the relevant image. You want to reduce the time spent performing manual operations while following Site Reliability Engineering principles. What should you do?
A. File a bug with the development team so they can find the root cause of the crashing instance.
B. Create a Managed instance Group with a single instance and use health checks to determine the system status.
C. Add a Load Balancer in front of the Compute Engine instance and use health checks to determine the system status.
D. Create a Stackdriver Monitoring dashboard with SMS alerts to be able to start recreating the crashed instance promptly after it crashed.
Disclaimer
This is a practice question. There is no guarantee of coming this question in the certification exam.
Answer
B
Explanation
A. File a bug with the development team so they can find the root cause of the crashing instance.
(No automation, toil not reduced.)
B. Create a Managed instance Group with a single instance and use health checks to determine the system status.
(https://cloud.google.com/compute/docs/instance-groups?hl=en#autohealing
The best option for reducing the time spent performing manual operations while following Site Reliability Engineering principles would be B.
A Managed Instance Group (MIG) is a GCP service that automatically creates, scales, and deletes instances based on the policies that you set. By creating a MIG with a single instance, you can set up health checks to automatically detect when the instance is crashing and replace it with a new instance, without manual intervention. This ensures that the service is always available, and it can reduce the time spent recreating the service and also minimize the risk of human errors.)
C. Add a Load Balancer in front of the Compute Engine instance and use health checks to determine the system status.
(No automation to recreate crashing instance, toil not reduced.)
D. Create a Stackdriver Monitoring dashboard with SMS alerts to be able to start recreating the crashed instance promptly after it crashed.
(No automation, toil not reduced.)