Detailed Course Outline
Module 1 - Introduction to Data Engineering
Topics:
- The role of a data engineer
- Data engineering challenges
- Introduction to BigQuery
- Data lakes and data warehouses
- Transactional databases versus data warehouses
- Partnering effectively with other data teams
- Managing data access and governance
- Build production-ready pipelines
- Google Cloud customer case study
Objectives:
- Discuss the role of a data engineer.
- Discuss benefits of doing data engineering in the cloud.
- Discuss challenges of data engineering practice and how building data pipelines in the cloud helps to address these.
- Review and understand the purpose of a data lake versus a data warehouse, and when to use which.
Module 2 - Building a Data Lake
Topics:
- Introduction to data lakes
- Data storage and ETL options on Google Cloud
- Building a data lake by using Cloud Storage
- Securing Cloud Storage
- Storing all sorts of data types
- Cloud SQL as your OLTP system
Objectives:
- Discuss why Cloud Storage is a great option to build a data lake on Google Cloud.
- Explain how to use Cloud SQL for a relational data lake.
Module 3 - Building a Data Warehouse
Topics:
- The modern data warehouse
- Introduction to BigQuery
- Getting started with BigQuery
- Loading data into BigQuery
- Exploring schemas
- Schema design
- Nested and repeated fields
- Optimizing with partitioning and clustering
Objectives:
- Discuss the requirements of a modern warehouse.
- Explain why BigQuery is the scalable data warehousing solution on Google Cloud.
- Discuss the core concepts of BigQuery and review options of loading data into BigQuery.