Course Overview
In this course, you will learn how to build an operational Data Lake that supports the analysis of structured and unstructured data. You will learn about the components and functions of the services that are involved in creating a Data Lake. You will use AWS Lake Formation to build a data lake, AWS Glue to build a data catalog, and Amazon Athena to analyze data. The course presentations and exercises deepen what you have learned by analyzing several common data lake architectures.
Who should attend
This course is designed for:
- Solutions Architects
- Big Data Developer
- Data Architects and Analysts
- Other data analysis experts
Certifications
This course is part of the following Certifications:
Prerequisites
We recommend that participants in this course meet the following requirements:
- Good practical knowledge of key AWS services such as Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
- Experience with a programming or scripting language
- First knowledge of the Linux operating system and the command line interface
- Notebook required to take part in the exercises, tablets are not suitable
Course Objectives
What you will learn in this course:
- Collect large amounts of data with services like Kinesis Streams and Firehose and store data securely and long term in Amazon Simple Storage Service.
- Create a metadata index of your data lake.
- Choose the best tools to capture, store, process, and analyze your data in Data Lake.
- Applying the knowledge in hands-on labs where hands-on experience can be gained by building a complete solution.
Course Content
The course covers the following concepts:
- The key services for building a serverless Data Lake architecture
- A data analysis solution that addresses the capture, storage, processing, and analysis workflows
- Repeatable deployment of templates to implement a Data Lake solution
- Create a metadata index and enable search
- Set up a large data transfer pipeline for multiple data sources
- Data transformation using simple functions triggered by events
- Data processing using the appropriate tools and services for the application
- Available options for optimized analysis of processed data
- Best practices for deployment and operations