Course Overview
Learn how to create an end-to-end, hardware-accelerated machine learning pipeline for large datasets. Throughout the development process, you’ll use diagnostic tools to identify delays and learn to mitigate common pitfalls.
Pré- requisitos
- Basic knowledge of a standard data science workflow on tabular data. To gain an adequate understanding, we recommend this article.
- Knowledge of distributed computing using Dask. To gain an adequate understanding, we recommend the “Get Started” guide from Dask.
- Completion of the DLI’s Fundamentals of Accelerated Data Science course or an ability to manipulate data using cuDF and some experience building machine learning models using cuML.
Objetivos do Curso
- Develop and deploy an accelerated end-to-end data processing pipeline for large datasets
- Scale data science workflows using distributed computing
- Perform DataFrame transformations that take advantage of hardware acceleration and avoid hidden slowdowns
- Enhance machine learning solutions through feature engineering and rapid experimentation
- Improve data processing pipeline performance by optimizing memory management and hardware utilization