Study material for Google Professional Data Engineer exam

23 May

A Data Engineer should be able to design, build, operationalize, secure, and monitor data processing systems with a particular emphasis on security and compliance; scalability and efficiency; reliability and fidelity; and flexibility and portability. A Data Engineer should also be able to leverage, deploy, and continuously train pre-existing machine learning models.   – Google


Google Cloud Professional Data Engineers are currently high in demand for their ability to maintain the process of accumulating, converting, and publishing data with a certain level of expertise.

For this exam, I will recommend to have a strong experience in Google cloud with various data focused services. If you don’t have strong experience and still targeting for this certification, no worries. This article is for everybody who is aspiring for the certificate.

What Google expects from you in this certification exam is, “Demonstrate your proficiency to design and build data processing systems and create machine learning models on Google Cloud Platform.

Before you start going through the links I’ve compiled in this article, just scan through this link to understand various topics covered as part of the certification exam.

 

 

 

If you prefer video courses for learning, I’ll suggest following courses.

Video Courses and Labs

 

 

 

If you prefer the reading way, you can go through following links. I’ll recommended to learn few topics through reading and practicals, mere video training won’t help much.

Reading

First of all, I’ll recommend to go through the Data Dossier prepared by Matthew and is freely available online. Second step is to learn SQL skills thoroughly with deep understanding of advanced query mechanism using SQL language. You can also do crash course freely provided by Google, “Crash Course in Machine Learning“.

Few more links by topics.

GENERALhttps://github.com/ml874/Data-Engineering-on-GCP-CheatsheetGCP Cheatsheet (data_engineering_on_GCP.pdf)
GENERALhttps://medium.com/@hello_92179GCP cheat sheets by Yaron Hollander
GCEhttps://cloud.google.com/compute/Compute Engine – Scalable, High-Performance Virtual Machines
GCEhttps://cloud.google.com/compute/docs/instances/preemptiblePreemptible VM Instances
GCEhttps://cloud.google.com/compute/docs/disks/Storage Options
GCEhttps://cloud.google.com/compute/docs/instance-groups/Instance Groups
GCEhttps://cloud.google.com/compute/docs/startupscriptRunning Startup Scripts
GCEhttps://cloud.google.com/compute/docs/shutdownscriptRunning Shutdown Scripts
GCEhttps://cloud.google.com/compute/docs/sustained-use-discountsSustained Use Discounts
Networkinghttps://cloud.google.com/vpc/docs/overviewOverview of Virtual Private Cloud
Networkinghttps://cloud.google.com/vpc/docs/vpcVirtual Private Cloud (VPC) Network Overview
Networkinghttps://cloud.google.com/vpc/docs/advanced-vpcAdvanced VPC Concepts
Networkinghttps://cloud.google.com/vpc/docs/shared-vpcShared VPC Overview
Networkinghttps://cloud.google.com/vpc/docs/vpc-peeringVPC Network Peering
Networkinghttps://cloud.google.com/vpc/docs/private-access-optionsPrivate Access Options for Services
Networkinghttps://cloud.google.com/vpn/docs/concepts/overviewCloud VPN Overview
Networkinghttps://cloud.google.com/interconnect/docs/concepts/overviewCloud Interconnect Overview
Networkinghttps://cloud.google.com/interconnect/docs/how-to/choose-typeChoose an interconnect type
Networkinghttps://cloud.google.com/nat/docs/overviewCloud NAT
GCShttps://cloud.google.com/appengine/docs/standard/java/googlecloudstorageclient/understanding-storage-featuresUnderstanding Google Cloud Storage Features
GCShttps://cloud.google.com/storage/docs/key-termsGCS Key Terms
GCShttps://cloud.google.com/storage/docs/storage-classesStorage Classes
GCShttps://cloud.google.com/storage/docs/bucket-locationsBucket Locations
GCShttps://cloud.google.com/storage/docs/encryption/Data encryption options
GCShttps://cloud.google.com/storage/docs/metadataObject metadata
GCShttps://cloud.google.com/storage/docs/pubsub-notificationsCloud Pub/Sub notifications for Cloud Storage
GCShttps://cloud.google.com/storage/docs/object-change-notificationObject change notification
GCShttps://cloud.google.com/storage/docs/object-versioningObject Versioning
GCShttps://cloud.google.com/storage/docs/lifecycleObject Lifecycle Management
GCShttps://cloud.google.com/storage/docs/requester-paysRequester Pays
GCShttps://cloud.google.com/storage/docs/bucket-lockRetention policies using Bucket Lock
GCShttps://cloud.google.com/storage/docs/transcodingTranscoding of gzip-compressed files
GCShttps://cloud.google.com/storage/docs/cross-originCross-origin resource sharing (CORS)
GCShttps://cloud.google.com/storage/docs/collaborationSharing and Collaboration
GCShttps://cloud.google.com/storage/docs/access-control/Access control options
GCShttps://cloud.google.com/storage/docs/access-control/iamCloud Identity and Access Management
GCShttps://cloud.google.com/storage/docs/access-control/listsAccess Control Lists (ACLs)
GCShttps://cloud.google.com/storage/docs/access-control/cookie-based-authenticationPerforming authenticated browser downloads
GCShttps://cloud.google.com/storage/docs/access-control/signed-urlsSigned URLs
GCShttps://cloud.google.com/storage/docs/xml-api/post-objectPOST Object
GCShttps://cloud.google.com/storage/docs/composite-objectsComposite Objects and Parallel Uploads
GCShttps://cloud.google.com/storage-transfer/docs/overviewStorage Transfer Service Overview
GCShttps://cloud.google.com/storage/docs/best-practicesBest practices for Cloud Storage
Cloud SQLhttps://cloud.google.com/sql/Cloud SQL
Cloud SQLhttps://cloud.google.com/sql/docs/mysql/high-availabilityOverview of the High Availability Configuration
Cloud SQLhttps://cloud.google.com/sql/docs/mysql/external-connection-methodsConnection options for external applications
Cloud SQLhttps://cloud.google.com/sql/docs/mysql/sql-proxyAbout the Cloud SQL Proxy
Cloud SQLhttps://cloud.google.com/sql/docs/mysql/replication/Replication options
Cloud SQLhttps://cloud.google.com/sql/docs/mysql/import-export/Best practices for importing and exporting data
Cloud SQLhttps://cloud.google.com/sql/docs/mysql/backup-recovery/backupsOverview of Backups
Cloud SQLhttps://cloud.google.com/sql/docs/mysql/backup-recovery/restoreOverview of Restoring an Instance
Cloud SQLhttps://cloud.google.com/sql/faqCloud SQL FAQ
BIGQUERYhttps://cloud.google.com/bigquery/what-is-bigqueryWhat is BigQuery?
BIGQUERYhttps://cloud.google.com/bigquery/docs/projectsProjects
BIGQUERYhttps://cloud.google.com/bigquery/docs/storing-dataStoring data
BIGQUERYhttps://cloud.google.com/bigquery/docs/loading-dataIntroduction to loading data into BigQuery
BIGQUERYhttps://cloud.google.com/bigquery/streaming-data-into-bigqueryStreaming Data into BigQuery
BIGQUERYhttps://cloud.google.com/bigquery/docs/loading-data-localLoading data from a local data source
BIGQUERYhttps://cloud.google.com/bigquery/docs/loading-data-cloud-storageIntroduction to loading data from Cloud Storage
BIGQUERYhttps://cloud.google.com/bigquery/docs/partitioned-tablesPartitioned Tables
BIGQUERYhttps://cloud.google.com/bigquery/docs/clustered-tablesIntroduction to clustered tables
BIGQUERYhttps://cloud.google.com/bigquery/pricing#queriesQuery pricing
BIGQUERYhttps://cloud.google.com/bigquery/docs/exporting-dataExporting table data
BIGQUERYhttps://cloud.google.com/bigquery/external-data-sourcesIntroduction to external data sources
BIGQUERYhttps://cloud.google.com/bigquery/docs/access-controlAccess control
BIGQUERYhttps://cloud.google.com/bigquery/docs/dataset-access-controlsControlling access to datasets
BIGQUERYhttps://cloud.google.com/bigquery/docs/scan-with-dlpUsing Cloud DLP to scan BigQuery data
BIGQUERYhttps://cloud.google.com/bigquery/docs/customer-managed-encryptionProtecting data with Cloud KMS keys
BIGQUERYhttps://cloud.google.com/bigquery/docs/slotsSlots
BIGQUERYhttps://cloud.google.com/bigquery/docs/best-practices-costsBigQuery best practices: Controlling costs
BIGQUERYhttps://cloud.google.com/bigquery/docs/updating-datasetsUpdating dataset properties
BIGQUERYhttps://cloud.google.com/bigquery/docs/best-practices-storageBigQuery Best practices: Optimizing storage
BIGQUERYhttps://cloud.google.com/bigquery/docs/best-practices-performance-inputManaging input data and data sources
BIGQUERYhttps://cloud.google.com/bigquery/docs/best-practices-performance-communicationOptimizing communication between slots
BIGQUERYhttps://cloud.google.com/bigquery/docs/best-practices-performance-computeOptimizing query computation
BIGQUERYhttps://cloud.google.com/bigquery/docs/best-practices-performance-outputManaging query outputs
BIGQUERYhttps://cloud.google.com/bigquery/docs/best-practices-performance-patternsAvoiding SQL anti-patterns
BIGQUERYhttps://cloud.google.com/bigquery/query-plan-explanationQuery plan and timeline
BIGQUERYhttps://cloud.google.com/bigquery/docs/views-introIntroduction to views
BIGQUERYhttps://cloud.google.com/bigquery/docs/view-access-controlsControlling access to views
BIGQUERYhttps://cloud.google.com/bigquery/docs/authorized-viewsCreating authorized views
BIGQUERYhttps://cloud.google.com/bigquery/docs/transfer-service-overviewIntroduction to BigQuery Data Transfer Service
BIGQUERYhttps://cloud.google.com/bigquery/sql-reference/data-manipulation-languageData Manipulation Language
BIGQUERYhttps://cloud.google.com/bigquery/docs/cached-resultsUsing cached query results
BIGQUERYhttps://cloud.google.com/bigquery/docs/querying-wildcard-tablesQuerying multiple tables using a wildcard table
BIGTABLEhttps://cloud.google.com/bigtable/docs/overviewOverview of Cloud Bigtable
BIGTABLEhttps://cloud.google.com/bigtable/docs/instances-clusters-nodesInstances, clusters, and nodes
BIGTABLEhttps://cloud.google.com/bigtable/docs/schema-designDesigning Your Schema
BIGTABLEhttps://cloud.google.com/bigtable/docs/schema-design-time-seriesSchema Design for Time Series Data
BIGTABLEhttps://cloud.google.com/bigtable/docs/access-controlAccess Control
BIGTABLEhttps://cloud.google.com/bigtable/docs/replication-overviewOverview of Replication
BIGTABLEhttps://cloud.google.com/bigtable/docs/replication-settingsExamples of Replication Settings
BIGTABLEhttps://cloud.google.com/bigtable/docs/app-profilesApplication Profiles
BIGTABLEhttps://cloud.google.com/bigtable/docs/failoversFailovers
BIGTABLEhttps://cloud.google.com/bigtable/docs/keyvis-overviewOverview of Key Visualizer
BIGTABLEhttps://cloud.google.com/bigtable/docs/keyvis-patternsHeatmap Patterns
BIGTABLEhttps://cloud.google.com/bigtable/docs/performanceUnderstanding Cloud Bigtable performance
BIGTABLEhttps://cloud.google.com/bigtable/docs/choosing-ssd-hddChoosing Between SSD and HDD Storage
DB COMPARISONhttp://stackoverflow.com/questions/30085326/google-cloud-bigtable-vs-google-cloud-datastoreGoogle Cloud Bigtable vs Google Cloud Datastore
DB COMPARISONhttps://db-engines.com/en/system/Google+Cloud+Bigtable%3BGoogle+Cloud+DatastoreSystem Properties Comparison Google Cloud Bigtable vs. Google Cloud Datastore
DB COMPARISONhttp://terrenceryan.com/blog/index.php/when-to-pick-google-bigtable-vs-other-cloud-platform-databases/When to Pick Google Bigtable vs Other Cloud Platform Databases
CLOUD DATAFLOWhttps://cloud.google.com/dataflow/docs/concepts/beam-programming-modelProgramming model for Apache Beam
DATAFLOWhttps://beam.apache.org/documentation/programming-guide/Apache Beam Programming Guide
DATAFLOWhttps://cloud.google.com/dataflow/docs/concepts/security-and-permissionsCloud Dataflow security and permissions
DATAFLOWhttps://cloud.google.com/dataflow/docs/concepts/access-controlCloud Dataflow Access Control Guide
DATAFLOWhttps://stackoverflow.com/questions/33518104/google-dataflow-vs-apache-sparkGoogle Dataflow vs Apache Spark
DATAFLOW
STACKDRIVER
https://cloud.google.com/blog/big-data/2017/03/monitoring-and-improving-your-google-cloud-dataflow-pipelines-with-google-stackdriver.htmlMonitoring and improving your Google Cloud Dataflow pipelines with Google Stackdriver
DATAPROChttps://cloud.google.com/dataproc/docs/concepts/overviewWhat is Google Cloud Dataproc?
DATAPROChttps://cloud.google.com/dataproc/docs/concepts/compute/custom-machine-typesCustom machine types
DATAPROChttps://cloud.google.com/dataproc/docs/concepts/compute/dataproc-local-ssdsCloud Dataproc Local SSDs
DATAPROChttps://cloud.google.com/dataproc/docs/concepts/preemptible-vmsPreemptible VMs
DATAPROChttps://cloud.google.com/dataproc/docs/concepts/configuring-clusters/init-actionsInitialization actions
DATAPROChttps://cloud.google.com/dataproc/docs/resources/faqCloud Dataproc FAQ
DATAPROChttps://cloud.google.com/dataproc/docs/connectors/cloud-bigtableGoogle Cloud Bigtable and Cloud Dataproc
DATAPROChttps://cloud.google.com/dataproc/docs/connectors/cloud-storageGoogle Cloud Storage connector
DATAPROChttps://www.quora.com/What-is-the-criteria-to-chose-Pig-Hive-Hbase-Storm-Solr-or-Spark-to-analyze-your-data-in-HadoopWhat is the criteria to chose Pig, Hive, Hbase, Storm, Solr, or Spark to analyze your data in Hadoop?
DATAPROChttps://stackoverflow.com/questions/13911501/when-to-use-hadoop-hbase-hive-and-pigWhen to use Hadoop, HBase, Hive and Pig?
DATAPROChttps://community.cloudera.com/t5/Hadoop-101-Training-Quickstart/The-Differences-between-Pig-Hive-and-HBase/ta-p/31007The Differences between Pig, Hive, and HBase
DATAPROChttps://www.dezyre.com/article/difference-between-pig-and-hive-the-two-key-components-of-hadoop-ecosystem/79Difference between Pig and Hive-The Two Key Components of Hadoop Ecosystem
DATAPROChttp://www.hadoop360.datasciencecentral.com/blog/pig-vs-hive-vs-sql-difference-between-the-big-data-tools?context=tag-hadoopPig vs Hive vs SQL – Difference between the Big Data Tools
DATAPROChttps://hadoopecosystemtable.github.io/The Hadoop Ecosystem Table
DATAPROChttps://www.smartdatacollective.com/hadoop-toolbox-when-use-what/Hadoop Toolbox: When to Use What
STACKDRIVER MONITORINGhttps://cloud.google.com/monitoring/docs/Stackdriver Monitoring documentation
STACKDRIVER LOGGINGhttps://cloud.google.com/logging/docs/basic-conceptsBasic Concepts
STACKDRIVER LOGGINGhttps://cloud.google.com/logging/docs/export/using_exported_logsUsing Exported Logs
STACKDRIVER LOGGINGhttps://cloud.google.com/logging/docs/agent/The Logging Agent
STACKDRIVER LOGGINGhttps://cloud.google.com/logging/docs/audit/Cloud Audit Logging
Stackdriver Debuggerhttps://cloud.google.com/debugger/docs/Stackdriver Debugger
STORAGEhttps://cloud.google.com/storage-options/Choosing a Storage Option
CLOUD PUB/SUBhttps://cloud.google.com/pubsub/docs/overviewWhat is Google Cloud Pub/Sub?
CLOUD PUB/SUBhttps://cloud.google.com/pubsub/docs/tasks-vs-pubsubChoosing between Cloud Tasks and Cloud Pub/Sub
CLOUD PUB/SUBhttps://cloud.google.com/pubsub/docs/subscriberSubscriber overview
CLOUD PUB/SUBhttps://cloud.google.com/pubsub/docs/publisherPublishing messages
CLOUD PUB/SUBhttps://cloud.google.com/pubsub/docs/orderingOrdering messages
CLOUD PUB/SUBhttps://cloud.google.com/pubsub/docs/replay-overviewReplaying and discarding messages
IAMhttps://cloud.google.com/iam/docs/overviewIAM Overview
IAMhttps://cloud.google.com/compute/docs/access/service-accountsService Accounts
IAMhttps://cloud.google.com/iam/docs/resource-hierarchy-access-controlUsing Resource Hierarchy for Access Control
IAMhttps://cloud.google.com/iam/docs/using-iam-securelyUsing IAM Securely
IAMhttps://cloud.google.com/resource-manager/docs/cloud-platform-resource-hierarchyResource Hierarchy
Securityhttps://cloud.google.com/armor/docs/security-policy-conceptsGoogle Cloud Armor Security Policy Concepts
Securityhttps://cloud.google.com/kms/docs/hsmCloud HSM
Securityhttps://cloud.google.com/kms/CLOUD KEY MANAGEMENT SERVICE
Securityhttps://cloud.google.com/dlp/Cloud Data Loss Prevention
Securityhttps://cloud.google.com/vpc-service-controls/docs/overviewOverview of VPC Service Controls
Securityhttps://cloud.google.com/security-command-center/docs/concepts-overviewCloud SCC conceptual overview
Securityhttps://cloud.google.com/security-scanner/CLOUD SECURITY SCANNER
Securityhttps://cloud.google.com/identity/docs/concepts/overviewGoogle Identity Platform Overview
Securityhttps://cloud.google.com/iap/docs/concepts-overviewCloud Identity-Aware Proxy overview
security and compliancehttps://cloud.google.com/security/gdpr/Google Cloud & the General Data Protection Regulation (GDPR)
security and compliancehttps://cloud.google.com/security/compliance/pci-dss/Standards, Regulations & Certifications
security and compliancehttps://cloud.google.com/solutions/pci-dss-compliance-in-gcpPCI Data Security Standard compliance
Securityhttps://cloud.google.com/blog/products/gcp/managing-encryption-keys-in-the-cloud-introducing-google-cloud-key-management-service?m=1Managing encryption keys in the cloud: introducing Google Cloud Key Management Service
Securityhttps://cloud.google.com/security/encryption-at-rest/ENCRYPTION AT REST
Securityhttps://cloud.google.com/security/compliance/hipaa-compliance/HIPAA
Securityhttps://cloud.google.com/security/compliance/coppa/COPPA (U.S.)
Securityhttps://cloud.google.com/security/compliance/fedramp/FedRAMP
   
MACHINE LEARNING ENGINEhttps://cloud.google.com/ml-engine/CLOUD MACHINE LEARNING ENGINE
Storagehttps://cloud.google.com/filestore/CLOUD FILESTORE
Storagehttps://cloud.google.com/products/data-transfer/CLOUD DATA TRANSFER
Cloud Composerhttps://cloud.google.com/composer/docs/concepts/overviewOverview of Cloud Composer
Transfer Appliancehttps://cloud.google.com/transfer-appliance/docs/2.0/overviewTransfer Appliance Overview
IOThttps://cloud.google.com/iot/docs/concepts/overviewCloud IoT Core overview
Cloud Spannerhttps://cloud.google.com/spanner/docs/overviewCloud Spanner
Cloud Spannerhttps://cloud.google.com/spanner/docs/schema-and-data-modelSchema and data model
Cloud Spannerhttps://cloud.google.com/spanner/docs/schema-designSchema design best practices
MACHINE LEARNINGhttps://cloud.google.com/automl/Cloud AutoML
MACHINE LEARNINGhttps://cloud.google.com/automl-tables/AutoML Tables
MACHINE LEARNINGhttps://cloud.google.com/deep-learning-vm/Deep Learning VM Image
MACHINE LEARNINGhttps://cloud.google.com/tpu/docs/tpusCloud Tensor Processing Units (TPUs)
MACHINE LEARNINGhttps://opensource.com/article/18/12/introduction-kubeflowAn introduction to Kubeflow
   
MLENGINE / TENSORFLOWhttps://cloud.google.com/ml-engine/docs/tensorflow/projects-models-versions-jobsProjects, Models, Versions, and Jobs
MLENGINE / TENSORFLOWhttps://cloud.google.com/ml-engine/docs/tensorflow/ml-solutions-overviewMachine Learning Workflow
MLENGINE / TENSORFLOWhttps://cloud.google.com/ml-engine/docs/tensorflow/data-prepPreparing Data
MLENGINE / TENSORFLOWhttps://cloud.google.com/ml-engine/docs/tensorflow/training-overviewTraining Overview
MLENGINE / TENSORFLOWhttps://cloud.google.com/ml-engine/docs/tensorflow/hyperparameter-tuning-overviewOverview of Hyperparameter Tuning
MLENGINE / TENSORFLOWhttps://cloud.google.com/ml-engine/docs/tensorflow/prediction-overviewPrediction Overview
MACHINE LEARNINGhttps://cloud.google.com/ml-engine/docs/how-tos/managing-models-jobsManaging Models and Jobs
MACHINE LEARNINGhttp://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/Supervised and Unsupervised Machine Learning Algorithms
DATALABhttps://cloud.google.com/datalab/CLOUD DATALAB
DATALABhttps://cloud.google.com/blog/big-data/2016/05/how-to-forecast-demand-with-google-bigquery-public-datasets-and-tensorflowHow to forecast demand with Google BigQuery, public datasets and TensorFlow
MACHINE LEARNINGhttps://www.tensorflow.org/get_started/summaries_and_tensorboardML – Tensorboard
MACHINE LEARNINGhttps://www.tensorflow.org/get_started/graph_vizML – Learning visualization
MACHINE LEARNINGhttps://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-algorithm-choiceHow to choose algorithms for Microsoft Azure Machine Learning
MACHINE LEARNINGhttps://medium.com/@nomadic_mind/new-to-machine-learning-avoid-these-three-mistakes-73258b3848a4New to Machine Learning? Avoid these three mistakes
kafkahttps://kafka.apache.org/introKafka – Introduction
DATAPREPhttps://cloud.google.com/dataprep/docs/conceptsDataprep – Concepts
Cloud Memorystorehttps://cloud.google.com/memorystore/docs/redis/redis-overviewOverview of Cloud Memorystore for Redis
Cloud Datastorehttps://cloud.google.com/datastore/docs/concepts/overviewCloud Datastore Overview
Cloud Datastorehttps://cloud.google.com/datastore/docs/firestore-or-datastoreChoosing between Native Mode and Datastore Mode

 

 

 

 

Practice Tests

After the preparation is done, just go through few practice tests.



Leave a Reply

Your email address will not be published. Required fields are marked *