Esquema Detallado del Curso
(DW6A1)
- Unit 1: Introduction to Big Data
- Exercise 1: Setting up the lab environment
- Unit 2: Introduction to IBM BigInsights
- Exercise 2: Getting started with IBM BigInsights
- Unit 3: IBM BigInsights for Analysts
- Exercise 3: Working with Big SQL and BigSheets
- Unit 4: IBM BigInsights for Data Scientist
- Exercise 4: Analyzing data with Big R, Jaql, and AQL
- Unit 5: IBM BigInsights for Enterprise Management
(DW6B1)
- Unit 1: IBM Open Platform with Apache Hadoop
- Exercise 1: Exploring the HDFS
- Unit 2: Apache Ambari
- Exercise 2: Managing Hadoop clusters with Apache Ambari
- Unit 3: Hadoop Distributed File System
- Exercise 3: File access & basic commands with HDFS
- Unit 4: MapReduce and Yarn
- Topic 1: Introduction to MapReduce based on MR1
- Topic 2: Limitations of MR1
- Topic 3: YARN and MR2
- Exercise 4: Creating and coding a simple MapReduce job (Possibly a more complex second Exercise)
- Unit 5: Apache Spark
- Exercise 5: Working with Sparks RDD to a Spark job
- Unit 6: Coordination, management, and governance
- Exercise 6: Apache ZooKeeper, Apache Slider, Apache Knox
- Unit 7: Data Movement
- Exercise 7: Moving data into Hadoop with Flume and Sqoop
- Unit 8: Storing and Accessing Data
- Topic 1: Representing Data: CSV, XML, JSON, and YAML
- Topic 2: Open Source Programming Languages: Pig, Hive, and Other [R, Python, etc]
- Topic 3: NoSQL Concepts
- Topic 4: Accessing Hadoop data using Hive
- Exercise 8: Performing CRUD operations using the HBase shell
- Topic 5: Querying Hadoop data using Hive
- Exercise 9: Using Hive to Access Hadoop / HBase Data
- Unit 9: Advanced Topics
- Topic 1: Controlling job workflows with Oozie
- Topic 2: Search using Apache Solr
- No lab exercises