Big Data and Hadoop

Preference Dates Timing Location
Evening Course 09 - 13 December 2018 07:00PM- 10:00PM Dubai Knowledge Park

Course Description

One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on some of the best technologies for this task. The top technology companies like Google, Facebook, Netflix, Airbus, Amazon, NASA, and more are all using Hadoop and Spark to solve their big data problems!

This course will enable you to learn and master the most popular Big Data and Hadoop technologies including HDFS, MapReduce, Pig, Hive and Spark.   It’s filled with hands-on activities and projects.  The participants will work on real-world use-cases from various industries.

Unit 1 – Big Data Overview

  • Overview of the Hadoop Ecosystem
  • Setting up your Hadoop Cluster

Unit 2 – Using Hadoop’s Core: HDFS and MapReduce

  • HDFS: What it is, and how it works
  • MapReduce: What it is, and how it works
  • How MapReduce distributes processing
  • Practical Project Using MapReduce

Unit 3 – Programming Hadoop with Pig

  • Introducing Pig
  • Pig Latin
  • Practical Project Using Pig

Unit 4 -Programming with Spark

  • What is Spark?
  • The Resilient Distributed Dataset (RDD)
  • Dataframes and Spark 2.0

Unit 5 – Using relational data stores with Hadoop

  • What is Hive?
  • How Hive works?
  • Integrating Relational Database (MySql) with Hadoop
  • Use Sqoop to import data from MySQL to HFDS
  • Use Sqoop to export data from Hadoop to MySQL

Unit 6 – Using non-relational data stores with Hadoop

  • Why NoSQL?
  • What is HBase
  • Practical Project Using HBase

Unit 7 – Feeding Data to your Cluster

  • Kafka Explained
  • Setting up Kafka, and publishing some data
  • Publishing web logs with Kafka
  • Flume explained
  • Set up Flume and publish logs with it
  • Set up Flume to monitor a directory and store its data in HDFS

Unit 8 – Final Project: Processing a Massive Dataset with Hadoop and Spark

  • Data analysts and database administrators who are curious about Hadoop and how it relates to their work.
  • Software engineers and programmers who want to understand the larger Hadoop ecosystem, and use it to store, analyze, and vend “big data” at scale.
  • Project, program, or product managers who want to understand the high-level architecture of Big Data and Hadoop.
  • Experience with Python Programming, Relational Databases and SQL.

The participants who have successfully completed this course are encouraged to take Innosoft Certified Big Data Professional Exam (BD-200)