You are running an experiment to see whether your users like a new feature of a web application. Shortly after deploying the feature as a canary release, you receive a spike in the number of 500 errors sent to users, and your monitoring reports show increased latency. You want to quickly minimize the negative impact on users. What should you do first?
A. Roll back the experimental canary release.
B. Start monitoring latency, traffic, errors, and saturation.
C. Record data for the postmortem document of the incident.
D. Trace the origin of 500 errors and the root cause of increased latency.
Disclaimer
This is a practice question. There is no guarantee of coming this question in the certification exam.
Answer
A
Explanation
When you receive a spike in the number of 500 errors sent to users and increased latency after deploying a new feature, it is important to take immediate action to minimize the negative impact on users. The first step should be to roll back the experimental canary release. This will remove the new feature from production and revert the system to its previous state, which should reduce the number of errors and decrease latency.
After rolling back the canary release, you can start monitoring latency, traffic, errors, and saturation (Option B) to determine the impact of the rollback and to make sure that the system is stable.
You can also record data for the postmortem document of the incident (Option C) to learn from the incident and to improve future releases.
Lastly, you can trace the origin of 500 errors and the root cause of increased latency (Option D) after rolling back the canary release to understand what went wrong and to prevent similar issues from happening again in the future.
A. Roll back the experimental canary release.
B. Start monitoring latency, traffic, errors, and saturation.
C. Record data for the postmortem document of the incident.
D. Trace the origin of 500 errors and the root cause of increased latency.