You are responsible for the reliability of a high-volume enterprise application. A large number of users report that an important subset of the application’s functionality “a data intensive reporting feature”, is consistently failing with an HTTP 500 error. When you investigate your application’s dashboards, you notice a strong correlation between the failures and a metric that represents the size of an internal queue used for generating reports. You trace the failures to a reporting backend that is experiencing high I/O wait times. You quickly fix the issue by resizing the backend’s persistent disk (PD). How you need to create an availability Service Level Indicator (SLI) for the report generation feature. How would you define it?
A. As the I/O wait times aggregated across all report generation backends
B. As the proportion of report generation requests that result in a successful response
C. As the application’s report generation queue size compared to a known-good threshold
D. As the reporting backend PD throughout capacity compared to a known-good threshold
Disclaimer
This is a practice question. There is no guarantee of coming this question in the certification exam.
Answer
B
Explanation
A. As the I/O wait times aggregated across all report generation backends.
B. As the proportion of report generation requests that result in a successful response.
(It is a valid availability Service Level Indicator (SLI) for the report generation feature. This indicator measures the percentage of requests for report generation that are successfully completed, and it is an indicator of the service availability. This SLI provides a clear and measurable way to track the availability of the report generation feature. It is a simple and easy to understand metric, that can be easily monitored and reported. It can be used to track the performance of the report generation feature over time and detect any potential issues that may cause it to become unavailable. It also allows you to detect and diagnose issues in the system quickly and take appropriate action to mitigate them. Additionally, it aligns well with the customer’s expectation of the report generation feature as they want to see a high percentage of successful report generation requests, which indicates that the feature is working correctly.
https://cloud.google.com/stackdriver/docs/solutions/slo-monitoring/sli-metrics/overview)
C. As the application’s report generation queue size compared to a known-good threshold.
D. As the reporting backend PD throughout capacity compared to a known-good threshold.