You support a service that recently had an outage. The outage was caused by a new release that exhausted the service memory resources. You rolled back the release successfully to mitigate the impact on users. You are now in charge of the post-mortem for the outage. You want to follow Site Reliability Engineering practices when developing the post-mortem. What should you do?
A. Focus on developing new features rather than avoiding the outages from recurring.
B. Focus on identifying the contributing causes of the incident rather than the individual responsible for the cause.
C. Plan individual meetings with all the engineers involved. Determine who approved and pushed the new release to production.
D. Use the Git history to find the related code commit. Prevent the engineer who made that commit from working on production services.
Disclaimer
This is a practice question. There is no guarantee of coming this question in the certification exam.
Answer
B
Explanation
Key points are underlined.
You support a service that recently had an outage. The outage was caused by a new release that exhausted the service memory resources. You rolled back the release successfully to mitigate the impact on users. You are now in charge of the post-mortem for the outage. You want to follow Site Reliability Engineering practices when developing the post-mortem. What should you do?
According to Site Reliability Engineering (SRE) practices, the goal of a post-mortem is to identify the underlying causes of the incident in order to take steps to prevent it from happening again in the future. This involves looking for patterns and issues in the system rather than looking for a specific person to blame. It’s important to have a focus on learning and continuous improvement, rather than assigning blame.
A. Focus on developing new features rather than avoiding the outages from recurring.
B. Focus on identifying the contributing causes of the incident rather than the individual responsible for the cause.
C. Plan individual meetings with all the engineers involved. Determine who approved and pushed the new release to production.
D. Use the Git history to find the related code commit. Prevent the engineer who made that commit from working on production services.