Your company is designing its data lake on Google Cloud and wants to develop different ingestion pipelines to collect unstructured data from different sources. After the data is stored in Google Cloud, it will be processed in several data pipelines to build a recommendation engine for end users on the website. The structure of the data retrieved from the source systems can change at any time. The data must be stored exactly as it was retrieved for reprocessing purposes in case the data structure is incompatible with the current processing pipelines. You need to design an architecture to support the use case after you retrieve the data. What should you do?
A. Send the data through the processing pipeline, and then store the processed data in a BigQuery table for reprocessing.
B. Store the data on a BigQuery table. Design the processing pipelines to retrieve the data from the table.
C. Send the data through the processing pipeline, and then store the processed data in a Cloud Storage bucket for reprocessing.
D. Store the data in a Cloud Storage bucket. Design the processing pipelines to retrieve the data from the bucket.
Disclaimer
This is a practice question. There is no guarantee of coming this question in the certification exam.
Answer
D
Explanation
A. Send the data through the processing pipeline, and then store the processed data in a BigQuery table for reprocessing.
(Ruled out. We need to store first and then process.)
B. Store the data on a BigQuery table. Design the processing pipelines to retrieve the data from the table.
(BigQuery table needs a fixed schema, so it won’t work.)
C. Send the data through the processing pipeline, and then store the processed data in a Cloud Storage bucket for reprocessing.
(Ruled out. We need to store first and then process.)
D. Store the data in a Cloud Storage bucket. Design the processing pipelines to retrieve the data from the bucket.
(The data needs to be stored as it is retrieved. This would mean that any processing would be done after it is stored.
Classical Data Lake ELT Extract -> Load -> Transform.)