You are the on-call Site Reliability Engineer for a microservice that is deployed to a Google Kubernetes Engine (GKE) Autopilot cluster. Your company runs an online store that publishes order messages to Pub/Sub, and a microservice receives these messages and updates stock information in the warehousing system. A sales event caused an increase in orders, and the stock information is not being updated quickly enough. This is causing a large number of orders to be accepted for products that are out of stock. You check the metrics for the microservice and compare them to typical levels:
Microservice Metrics | Typical State | Current State |
---|---|---|
Average CPU across all Pods | 20% of Pod limit | 30% of Pod limit |
Average memory across all Pods | 10% of Pod limit | 10% of Pod limit |
Pub/Sub subscription: Average oldest unacknowledged message age | 347 milliseconds | 8074 milliseconds |
Pub/Sub subscription: Average undelivered messages | 5 messages | 14705 messages |
Pub/Sub subscription: Average acknowledgement latency | 312 milliseconds | 354 milliseconds |
You need to ensure that the warehouse system accurately reflects product inventory at the time orders are placed and minimize the impact on customers. What should you do?
A. Decrease the acknowledgment deadline on the subscription.
B. Add a virtual queue to the online store that allows typical traffic levels.
C. Increase the number of Pod replicas.
D. Increase the Pod CPU and memory limits.
Disclaimer
This is a practice question. There is no guarantee of coming this question in the certification exam.
Answer
C
Explanation
A. Decrease the acknowledgment deadline on the subscription.
B. Add a virtual queue to the online store that allows typical traffic levels.
C. Increase the number of Pod replicas.
(Increasing the number of Pod replicas can help scale out the microservice to handle the increased load efficiently. This will allow the microservice to process the order messages and update the stock information in the warehousing system more quickly, reducing the likelihood of accepting orders for out-of-stock products. This approach addresses the issue at its root by ensuring that there are enough resources available to handle the increased workload effectively.)
D. Increase the Pod CPU and memory limits.