Google Cloud Professional DevOps Engineer Q84

This is post 84 of 120 in the series “Google Cloud Professional DevOps Engineer Practice Questions”

You encounter a large number of outages in the production systems you support. You receive alerts for all the outages that wake you up at night. The alerts are due to unhealthy systems that are automatically restarted within a minute. You want to set up a process that would prevent staff burnout while following Site Reliability Engineering practices. What should you do?

A. Eliminate unactionable alerts.

B. Create an incident report for each of the alerts.

C. Distribute the alerts to engineers in different time zones.

D. Redefine the related Service Level Objective so that the error budget is not exhausted.

Disclaimer

This is a practice question. There is no guarantee of coming this question in the certification exam.

Answer

Explanation

A. Eliminate unactionable alerts.
(Eliminate bad monitoring : Unactionable alerts (i.e., spam).
To follow SRE practice, we should eliminate unactionable alert which is pointless and to increase precision.
https://cloud.google.com/blog/products/management-tools/meeting-reliability-challenges-with-sre-principles)

~~B. Create an incident report for each of the alerts.~~

~~C. Distribute the alerts to engineers in different time zones.~~

~~D. Redefine the related Service Level Objective so that the error budget is not exhausted.~~

Post Views: 5

Leave a Reply Cancel reply