Jinal Desai

My thoughts and learnings

Study material for Google Professional Data Engineer exam

google-professional-data-engineer-exam-guide

A Data Engineer should be able to design, build, operationalize, secure, and monitor data processing systems with a particular emphasis on security and compliance; scalability and efficiency; reliability and fidelity; and flexibility and portability. A Data Engineer should also be able to leverage, deploy, and continuously train pre-existing machine learning models.   – Google


Google Cloud Professional Data Engineers are currently high in demand for their ability to maintain the process of accumulating, converting, and publishing data with a certain level of expertise.

For this exam, I will recommend to have a strong experience in Google cloud with various data focused services. If you don’t have strong experience and still targeting for this certification, no worries. This article is for everybody who is aspiring for the certificate.

What Google expects from you in this certification exam is, “Demonstrate your proficiency to design and build data processing systems and create machine learning models on Google Cloud Platform.

Before you start going through the links I’ve compiled in this article, just scan through this link to understand various topics covered as part of the certification exam.

 

 

 

If you prefer video courses for learning, I’ll suggest following courses.

Video Courses and Labs

 

 

 

If you prefer the reading way, you can go through following links. I’ll recommended to learn few topics through reading and practicals, mere video training won’t help much.

Reading

First of all, I’ll recommend to go through the Data Dossier prepared by Matthew and is freely available online. Second step is to learn SQL skills thoroughly with deep understanding of advanced query mechanism using SQL language. You can also do crash course freely provided by Google, “Crash Course in Machine Learning“.

Few more links by topics.

GENERAL https://github.com/ml874/Data-Engineering-on-GCP-Cheatsheet GCP Cheatsheet (data_engineering_on_GCP.pdf)
GENERAL https://medium.com/@hello_92179 GCP cheat sheets by Yaron Hollander
GCE https://cloud.google.com/compute/ Compute Engine – Scalable, High-Performance Virtual Machines
GCE https://cloud.google.com/compute/docs/instances/preemptible Preemptible VM Instances
GCE https://cloud.google.com/compute/docs/disks/ Storage Options
GCE https://cloud.google.com/compute/docs/instance-groups/ Instance Groups
GCE https://cloud.google.com/compute/docs/startupscript Running Startup Scripts
GCE https://cloud.google.com/compute/docs/shutdownscript Running Shutdown Scripts
GCE https://cloud.google.com/compute/docs/sustained-use-discounts Sustained Use Discounts
Networking https://cloud.google.com/vpc/docs/overview Overview of Virtual Private Cloud
Networking https://cloud.google.com/vpc/docs/vpc Virtual Private Cloud (VPC) Network Overview
Networking https://cloud.google.com/vpc/docs/advanced-vpc Advanced VPC Concepts
Networking https://cloud.google.com/vpc/docs/shared-vpc Shared VPC Overview
Networking https://cloud.google.com/vpc/docs/vpc-peering VPC Network Peering
Networking https://cloud.google.com/vpc/docs/private-access-options Private Access Options for Services
Networking https://cloud.google.com/vpn/docs/concepts/overview Cloud VPN Overview
Networking https://cloud.google.com/interconnect/docs/concepts/overview Cloud Interconnect Overview
Networking https://cloud.google.com/interconnect/docs/how-to/choose-type Choose an interconnect type
Networking https://cloud.google.com/nat/docs/overview Cloud NAT
GCS https://cloud.google.com/appengine/docs/standard/java/googlecloudstorageclient/understanding-storage-features Understanding Google Cloud Storage Features
GCS https://cloud.google.com/storage/docs/key-terms GCS Key Terms
GCS https://cloud.google.com/storage/docs/storage-classes Storage Classes
GCS https://cloud.google.com/storage/docs/bucket-locations Bucket Locations
GCS https://cloud.google.com/storage/docs/encryption/ Data encryption options
GCS https://cloud.google.com/storage/docs/metadata Object metadata
GCS https://cloud.google.com/storage/docs/pubsub-notifications Cloud Pub/Sub notifications for Cloud Storage
GCS https://cloud.google.com/storage/docs/object-change-notification Object change notification
GCS https://cloud.google.com/storage/docs/object-versioning Object Versioning
GCS https://cloud.google.com/storage/docs/lifecycle Object Lifecycle Management
GCS https://cloud.google.com/storage/docs/requester-pays Requester Pays
GCS https://cloud.google.com/storage/docs/bucket-lock Retention policies using Bucket Lock
GCS https://cloud.google.com/storage/docs/transcoding Transcoding of gzip-compressed files
GCS https://cloud.google.com/storage/docs/cross-origin Cross-origin resource sharing (CORS)
GCS https://cloud.google.com/storage/docs/collaboration Sharing and Collaboration
GCS https://cloud.google.com/storage/docs/access-control/ Access control options
GCS https://cloud.google.com/storage/docs/access-control/iam Cloud Identity and Access Management
GCS https://cloud.google.com/storage/docs/access-control/lists Access Control Lists (ACLs)
GCS https://cloud.google.com/storage/docs/access-control/cookie-based-authentication Performing authenticated browser downloads
GCS https://cloud.google.com/storage/docs/access-control/signed-urls Signed URLs
GCS https://cloud.google.com/storage/docs/xml-api/post-object POST Object
GCS https://cloud.google.com/storage/docs/composite-objects Composite Objects and Parallel Uploads
GCS https://cloud.google.com/storage-transfer/docs/overview Storage Transfer Service Overview
GCS https://cloud.google.com/storage/docs/best-practices Best practices for Cloud Storage
Cloud SQL https://cloud.google.com/sql/ Cloud SQL
Cloud SQL https://cloud.google.com/sql/docs/mysql/high-availability Overview of the High Availability Configuration
Cloud SQL https://cloud.google.com/sql/docs/mysql/external-connection-methods Connection options for external applications
Cloud SQL https://cloud.google.com/sql/docs/mysql/sql-proxy About the Cloud SQL Proxy
Cloud SQL https://cloud.google.com/sql/docs/mysql/replication/ Replication options
Cloud SQL https://cloud.google.com/sql/docs/mysql/import-export/ Best practices for importing and exporting data
Cloud SQL https://cloud.google.com/sql/docs/mysql/backup-recovery/backups Overview of Backups
Cloud SQL https://cloud.google.com/sql/docs/mysql/backup-recovery/restore Overview of Restoring an Instance
Cloud SQL https://cloud.google.com/sql/faq Cloud SQL FAQ
BIGQUERY https://cloud.google.com/bigquery/what-is-bigquery What is BigQuery?
BIGQUERY https://cloud.google.com/bigquery/docs/projects Projects
BIGQUERY https://cloud.google.com/bigquery/docs/storing-data Storing data
BIGQUERY https://cloud.google.com/bigquery/docs/loading-data Introduction to loading data into BigQuery
BIGQUERY https://cloud.google.com/bigquery/streaming-data-into-bigquery Streaming Data into BigQuery
BIGQUERY https://cloud.google.com/bigquery/docs/loading-data-local Loading data from a local data source
BIGQUERY https://cloud.google.com/bigquery/docs/loading-data-cloud-storage Introduction to loading data from Cloud Storage
BIGQUERY https://cloud.google.com/bigquery/docs/partitioned-tables Partitioned Tables
BIGQUERY https://cloud.google.com/bigquery/docs/clustered-tables Introduction to clustered tables
BIGQUERY https://cloud.google.com/bigquery/pricing#queries Query pricing
BIGQUERY https://cloud.google.com/bigquery/docs/exporting-data Exporting table data
BIGQUERY https://cloud.google.com/bigquery/external-data-sources Introduction to external data sources
BIGQUERY https://cloud.google.com/bigquery/docs/access-control Access control
BIGQUERY https://cloud.google.com/bigquery/docs/dataset-access-controls Controlling access to datasets
BIGQUERY https://cloud.google.com/bigquery/docs/scan-with-dlp Using Cloud DLP to scan BigQuery data
BIGQUERY https://cloud.google.com/bigquery/docs/customer-managed-encryption Protecting data with Cloud KMS keys
BIGQUERY https://cloud.google.com/bigquery/docs/slots Slots
BIGQUERY https://cloud.google.com/bigquery/docs/best-practices-costs BigQuery best practices: Controlling costs
BIGQUERY https://cloud.google.com/bigquery/docs/updating-datasets Updating dataset properties
BIGQUERY https://cloud.google.com/bigquery/docs/best-practices-storage BigQuery Best practices: Optimizing storage
BIGQUERY https://cloud.google.com/bigquery/docs/best-practices-performance-input Managing input data and data sources
BIGQUERY https://cloud.google.com/bigquery/docs/best-practices-performance-communication Optimizing communication between slots
BIGQUERY https://cloud.google.com/bigquery/docs/best-practices-performance-compute Optimizing query computation
BIGQUERY https://cloud.google.com/bigquery/docs/best-practices-performance-output Managing query outputs
BIGQUERY https://cloud.google.com/bigquery/docs/best-practices-performance-patterns Avoiding SQL anti-patterns
BIGQUERY https://cloud.google.com/bigquery/query-plan-explanation Query plan and timeline
BIGQUERY https://cloud.google.com/bigquery/docs/views-intro Introduction to views
BIGQUERY https://cloud.google.com/bigquery/docs/view-access-controls Controlling access to views
BIGQUERY https://cloud.google.com/bigquery/docs/authorized-views Creating authorized views
BIGQUERY https://cloud.google.com/bigquery/docs/transfer-service-overview Introduction to BigQuery Data Transfer Service
BIGQUERY https://cloud.google.com/bigquery/sql-reference/data-manipulation-language Data Manipulation Language
BIGQUERY https://cloud.google.com/bigquery/docs/cached-results Using cached query results
BIGQUERY https://cloud.google.com/bigquery/docs/querying-wildcard-tables Querying multiple tables using a wildcard table
BIGTABLE https://cloud.google.com/bigtable/docs/overview Overview of Cloud Bigtable
BIGTABLE https://cloud.google.com/bigtable/docs/instances-clusters-nodes Instances, clusters, and nodes
BIGTABLE https://cloud.google.com/bigtable/docs/schema-design Designing Your Schema
BIGTABLE https://cloud.google.com/bigtable/docs/schema-design-time-series Schema Design for Time Series Data
BIGTABLE https://cloud.google.com/bigtable/docs/access-control Access Control
BIGTABLE https://cloud.google.com/bigtable/docs/replication-overview Overview of Replication
BIGTABLE https://cloud.google.com/bigtable/docs/replication-settings Examples of Replication Settings
BIGTABLE https://cloud.google.com/bigtable/docs/app-profiles Application Profiles
BIGTABLE https://cloud.google.com/bigtable/docs/failovers Failovers
BIGTABLE https://cloud.google.com/bigtable/docs/keyvis-overview Overview of Key Visualizer
BIGTABLE https://cloud.google.com/bigtable/docs/keyvis-patterns Heatmap Patterns
BIGTABLE https://cloud.google.com/bigtable/docs/performance Understanding Cloud Bigtable performance
BIGTABLE https://cloud.google.com/bigtable/docs/choosing-ssd-hdd Choosing Between SSD and HDD Storage
DB COMPARISON http://stackoverflow.com/questions/30085326/google-cloud-bigtable-vs-google-cloud-datastore Google Cloud Bigtable vs Google Cloud Datastore
DB COMPARISON https://db-engines.com/en/system/Google+Cloud+Bigtable%3BGoogle+Cloud+Datastore System Properties Comparison Google Cloud Bigtable vs. Google Cloud Datastore
DB COMPARISON http://terrenceryan.com/blog/index.php/when-to-pick-google-bigtable-vs-other-cloud-platform-databases/ When to Pick Google Bigtable vs Other Cloud Platform Databases
CLOUD DATAFLOW https://cloud.google.com/dataflow/docs/concepts/beam-programming-model Programming model for Apache Beam
DATAFLOW https://beam.apache.org/documentation/programming-guide/ Apache Beam Programming Guide
DATAFLOW https://cloud.google.com/dataflow/docs/concepts/security-and-permissions Cloud Dataflow security and permissions
DATAFLOW https://cloud.google.com/dataflow/docs/concepts/access-control Cloud Dataflow Access Control Guide
DATAFLOW https://stackoverflow.com/questions/33518104/google-dataflow-vs-apache-spark Google Dataflow vs Apache Spark
DATAFLOW
STACKDRIVER
https://cloud.google.com/blog/big-data/2017/03/monitoring-and-improving-your-google-cloud-dataflow-pipelines-with-google-stackdriver.html Monitoring and improving your Google Cloud Dataflow pipelines with Google Stackdriver
DATAPROC https://cloud.google.com/dataproc/docs/concepts/overview What is Google Cloud Dataproc?
DATAPROC https://cloud.google.com/dataproc/docs/concepts/compute/custom-machine-types Custom machine types
DATAPROC https://cloud.google.com/dataproc/docs/concepts/compute/dataproc-local-ssds Cloud Dataproc Local SSDs
DATAPROC https://cloud.google.com/dataproc/docs/concepts/preemptible-vms Preemptible VMs
DATAPROC https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/init-actions Initialization actions
DATAPROC https://cloud.google.com/dataproc/docs/resources/faq Cloud Dataproc FAQ
DATAPROC https://cloud.google.com/dataproc/docs/connectors/cloud-bigtable Google Cloud Bigtable and Cloud Dataproc
DATAPROC https://cloud.google.com/dataproc/docs/connectors/cloud-storage Google Cloud Storage connector
DATAPROC https://www.quora.com/What-is-the-criteria-to-chose-Pig-Hive-Hbase-Storm-Solr-or-Spark-to-analyze-your-data-in-Hadoop What is the criteria to chose Pig, Hive, Hbase, Storm, Solr, or Spark to analyze your data in Hadoop?
DATAPROC https://stackoverflow.com/questions/13911501/when-to-use-hadoop-hbase-hive-and-pig When to use Hadoop, HBase, Hive and Pig?
DATAPROC https://community.cloudera.com/t5/Hadoop-101-Training-Quickstart/The-Differences-between-Pig-Hive-and-HBase/ta-p/31007 The Differences between Pig, Hive, and HBase
DATAPROC https://www.dezyre.com/article/difference-between-pig-and-hive-the-two-key-components-of-hadoop-ecosystem/79 Difference between Pig and Hive-The Two Key Components of Hadoop Ecosystem
DATAPROC http://www.hadoop360.datasciencecentral.com/blog/pig-vs-hive-vs-sql-difference-between-the-big-data-tools?context=tag-hadoop Pig vs Hive vs SQL – Difference between the Big Data Tools
DATAPROC https://hadoopecosystemtable.github.io/ The Hadoop Ecosystem Table
DATAPROC https://www.smartdatacollective.com/hadoop-toolbox-when-use-what/ Hadoop Toolbox: When to Use What
STACKDRIVER MONITORING https://cloud.google.com/monitoring/docs/ Stackdriver Monitoring documentation
STACKDRIVER LOGGING https://cloud.google.com/logging/docs/basic-concepts Basic Concepts
STACKDRIVER LOGGING https://cloud.google.com/logging/docs/export/using_exported_logs Using Exported Logs
STACKDRIVER LOGGING https://cloud.google.com/logging/docs/agent/ The Logging Agent
STACKDRIVER LOGGING https://cloud.google.com/logging/docs/audit/ Cloud Audit Logging
Stackdriver Debugger https://cloud.google.com/debugger/docs/ Stackdriver Debugger
STORAGE https://cloud.google.com/storage-options/ Choosing a Storage Option
CLOUD PUB/SUB https://cloud.google.com/pubsub/docs/overview What is Google Cloud Pub/Sub?
CLOUD PUB/SUB https://cloud.google.com/pubsub/docs/tasks-vs-pubsub Choosing between Cloud Tasks and Cloud Pub/Sub
CLOUD PUB/SUB https://cloud.google.com/pubsub/docs/subscriber Subscriber overview
CLOUD PUB/SUB https://cloud.google.com/pubsub/docs/publisher Publishing messages
CLOUD PUB/SUB https://cloud.google.com/pubsub/docs/ordering Ordering messages
CLOUD PUB/SUB https://cloud.google.com/pubsub/docs/replay-overview Replaying and discarding messages
IAM https://cloud.google.com/iam/docs/overview IAM Overview
IAM https://cloud.google.com/compute/docs/access/service-accounts Service Accounts
IAM https://cloud.google.com/iam/docs/resource-hierarchy-access-control Using Resource Hierarchy for Access Control
IAM https://cloud.google.com/iam/docs/using-iam-securely Using IAM Securely
IAM https://cloud.google.com/resource-manager/docs/cloud-platform-resource-hierarchy Resource Hierarchy
Security https://cloud.google.com/armor/docs/security-policy-concepts Google Cloud Armor Security Policy Concepts
Security https://cloud.google.com/kms/docs/hsm Cloud HSM
Security https://cloud.google.com/kms/ CLOUD KEY MANAGEMENT SERVICE
Security https://cloud.google.com/dlp/ Cloud Data Loss Prevention
Security https://cloud.google.com/vpc-service-controls/docs/overview Overview of VPC Service Controls
Security https://cloud.google.com/security-command-center/docs/concepts-overview Cloud SCC conceptual overview
Security https://cloud.google.com/security-scanner/ CLOUD SECURITY SCANNER
Security https://cloud.google.com/identity/docs/concepts/overview Google Identity Platform Overview
Security https://cloud.google.com/iap/docs/concepts-overview Cloud Identity-Aware Proxy overview
security and compliance https://cloud.google.com/security/gdpr/ Google Cloud & the General Data Protection Regulation (GDPR)
security and compliance https://cloud.google.com/security/compliance/pci-dss/ Standards, Regulations & Certifications
security and compliance https://cloud.google.com/solutions/pci-dss-compliance-in-gcp PCI Data Security Standard compliance
Security https://cloud.google.com/blog/products/gcp/managing-encryption-keys-in-the-cloud-introducing-google-cloud-key-management-service?m=1 Managing encryption keys in the cloud: introducing Google Cloud Key Management Service
Security https://cloud.google.com/security/encryption-at-rest/ ENCRYPTION AT REST
Security https://cloud.google.com/security/compliance/hipaa-compliance/ HIPAA
Security https://cloud.google.com/security/compliance/coppa/ COPPA (U.S.)
Security https://cloud.google.com/security/compliance/fedramp/ FedRAMP
     
MACHINE LEARNING ENGINE https://cloud.google.com/ml-engine/ CLOUD MACHINE LEARNING ENGINE
Storage https://cloud.google.com/filestore/ CLOUD FILESTORE
Storage https://cloud.google.com/products/data-transfer/ CLOUD DATA TRANSFER
Cloud Composer https://cloud.google.com/composer/docs/concepts/overview Overview of Cloud Composer
Transfer Appliance https://cloud.google.com/transfer-appliance/docs/2.0/overview Transfer Appliance Overview
IOT https://cloud.google.com/iot/docs/concepts/overview Cloud IoT Core overview
Cloud Spanner https://cloud.google.com/spanner/docs/overview Cloud Spanner
Cloud Spanner https://cloud.google.com/spanner/docs/schema-and-data-model Schema and data model
Cloud Spanner https://cloud.google.com/spanner/docs/schema-design Schema design best practices
MACHINE LEARNING https://cloud.google.com/automl/ Cloud AutoML
MACHINE LEARNING https://cloud.google.com/automl-tables/ AutoML Tables
MACHINE LEARNING https://cloud.google.com/deep-learning-vm/ Deep Learning VM Image
MACHINE LEARNING https://cloud.google.com/tpu/docs/tpus Cloud Tensor Processing Units (TPUs)
MACHINE LEARNING https://opensource.com/article/18/12/introduction-kubeflow An introduction to Kubeflow
     
MLENGINE / TENSORFLOW https://cloud.google.com/ml-engine/docs/tensorflow/projects-models-versions-jobs Projects, Models, Versions, and Jobs
MLENGINE / TENSORFLOW https://cloud.google.com/ml-engine/docs/tensorflow/ml-solutions-overview Machine Learning Workflow
MLENGINE / TENSORFLOW https://cloud.google.com/ml-engine/docs/tensorflow/data-prep Preparing Data
MLENGINE / TENSORFLOW https://cloud.google.com/ml-engine/docs/tensorflow/training-overview Training Overview
MLENGINE / TENSORFLOW https://cloud.google.com/ml-engine/docs/tensorflow/hyperparameter-tuning-overview Overview of Hyperparameter Tuning
MLENGINE / TENSORFLOW https://cloud.google.com/ml-engine/docs/tensorflow/prediction-overview Prediction Overview
MACHINE LEARNING https://cloud.google.com/ml-engine/docs/how-tos/managing-models-jobs Managing Models and Jobs
MACHINE LEARNING http://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/ Supervised and Unsupervised Machine Learning Algorithms
DATALAB https://cloud.google.com/datalab/ CLOUD DATALAB
DATALAB https://cloud.google.com/blog/big-data/2016/05/how-to-forecast-demand-with-google-bigquery-public-datasets-and-tensorflow How to forecast demand with Google BigQuery, public datasets and TensorFlow
MACHINE LEARNING https://www.tensorflow.org/get_started/summaries_and_tensorboard ML – Tensorboard
MACHINE LEARNING https://www.tensorflow.org/get_started/graph_viz ML – Learning visualization
MACHINE LEARNING https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-algorithm-choice How to choose algorithms for Microsoft Azure Machine Learning
MACHINE LEARNING https://medium.com/@nomadic_mind/new-to-machine-learning-avoid-these-three-mistakes-73258b3848a4 New to Machine Learning? Avoid these three mistakes
kafka https://kafka.apache.org/intro Kafka – Introduction
DATAPREP https://cloud.google.com/dataprep/docs/concepts Dataprep – Concepts
Cloud Memorystore https://cloud.google.com/memorystore/docs/redis/redis-overview Overview of Cloud Memorystore for Redis
Cloud Datastore https://cloud.google.com/datastore/docs/concepts/overview Cloud Datastore Overview
Cloud Datastore https://cloud.google.com/datastore/docs/firestore-or-datastore Choosing between Native Mode and Datastore Mode

 

 

 

 

Practice Tests

After the preparation is done, just go through few practice tests.

Leave a Reply

Your email address will not be published. Required fields are marked *