Your company has an application running on Google Cloud that collects data from thousands of physical devices that are globally distributed. Data is published to Pub/Sub and streamed in real time into an SSD Cloud Bigtable cluster via a Dataflow pipeline. The operations team informs you that your Cloud Bigtable cluster has a hotspot, and queries are taking longer than expected. You need to resolve the problem and prevent it from happening in the future. What should you do?
A. Advise your clients to use HBase APIs instead of NodeJS APIs.
B. Delete records older than 30 days.
C. Review your RowKey strategy and ensure that keys are evenly spread across the alphabet.
D. Double the number of nodes you currently have.
Disclaimer
This is a practice question. There is no guarantee of coming this question in the certification exam.
Answer
C
Explanation
A. Advise your clients to use HBase APIs instead of NodeJS APIs.
(Ruled out.)
B. Delete records older than 30 days.
(Ruled out.)
C. Review your RowKey strategy and ensure that keys are evenly spread across the alphabet.
(https://cloud.google.com/bigtable/docs/schema-design#row-keys
The RowKey is used to sort data within a Cloud Bigtable cluster. If the keys are not evenly spread across the alphabet, it can result in a hotspot and slow down queries. To prevent this from happening in the future, you should review your RowKey strategy and ensure that keys are evenly spread across the alphabet. This will help to distribute the data evenly across the cluster and improve query performance. Other potential solutions to consider include adding more nodes to the cluster or optimizing your query patterns. However, deleting records older than 30 days or advising clients to use HBase APIs instead of NodeJS APIs would not address the issue of a hotspot in the cluster.)
D. Double the number of nodes you currently have.
(This will not help.)