You have a Compute Engine instance that uses the default Debian image. The application hosted on this instance recently suffered a series of crashes that you weren’t able to debug in real time; the application process died suddenly every time. The application usually consumes 50% of the instance’s memory, and normally never more than 70%, but you suspect that a memory leak was responsible for the crashes. You want to validate this hypothesis. What should you do?
A. Go to Stackdriver’s Metric Explorer and look for the “compute.googleapis.com/guest/system/problem_count” metric for that instance. Examine its value for when the application crashed in the past.
B. In Stackdriver, create an uptime check for your application. Create an alert policy for that uptime check to be notified when your application crashes. When you receive an alert, use your usual debugging tools to investigate the behavior of the application in real time.
C. Install the Stackdriver Monitoring agent on the instance. Go to Stackdriver’s Metric Explorer and look for the “agent.googleapis/memory/percent_used” metric for that instance. Examine its value for when the application crashed in the past.
D. Install the Stackdriver Monitoring agent on the instance. Create an alert policy on the “agent.googleapis.com/memory/percent_used” metrics for that instance to be alerted when the memory used is higher than 75%. When you receive an alert, use your debugging tools to investigate the behavior of the application in real time.
Disclaimer
This is a practice question. There is no guarantee of coming this question in the certification exam.
Answer
D
Explanation
guest/system/problem_count
A number of times a machine problem has happened. Sampled every 60 seconds.
memory/percent_used
Percentage of memory used by each memory state. Summing percentages of all states yield 100 percent. Sampled every 60 seconds.
A. Go to Stackdriver’s Metric Explorer and look for the “compute.googleapis.com/guest/system/problem_count” metric for that instance. Examine its value for when the application crashed in the past.
(Getting a count of how many times the problem happened is not helpful.)
B. In Stackdriver, create an uptime check for your application. Create an alert policy for that uptime check to be notified when your application crashes. When you receive an alert, use your usual debugging tools to investigate the behavior of the application in real time.
(The uptime check notification will come delayed after the crash. Too late for real time investigation.)
C. Install the Stackdriver Monitoring agent on the instance. Go to Stackdriver’s Metric Explorer and look for the “agent.googleapis/memory/percent_used” metric for that instance. Examine its value for when the application crashed in the past.
(This gives no useful information. It would indicate that there was more memory used, but not enough to debug.)
D. Install the Stackdriver Monitoring agent on the instance. Create an alert policy on the “agent.googleapis.com/memory/percent_used” metrics for that instance to be alerted when the memory used is higher than 75%. When you receive an alert, use your debugging tools to investigate the behavior of the application in real time.
(Alert arrives before the event occurs. Gives us the ability to investigate/debug the running application when issue occurs.)