Cloud Computing for Big Data
|20 - 24 June 2018||06:30PM - 09:30PM||Dubai Knowledge Park||1,500 USD|
To be able to process massive datasets, you need to setup clusters for both data processing and data storage. Many of the aspiring data science professionals or engineers have very little knowledge or experience on how to do this in a Linux environment. This course will enable you to master all the skills required to setup cloud clusters for data storage using Hadoop’s HDFS, and Spark for data processing.
Not only you will successfully install and configure Hadoop and Spark clusters, but you will also learn how configure your development environment (Jupyter Notebook) to access these clusters to store and process massive datasets.
Hadoop Cluster Installation and Configuration
Architecture of a Hadoop Cluster
Creating and Distributing SSH Keys
Downloading and Unpacking Hadoop Binaries
Setting up Environment Variables
Configuring the Master Node
Slave Nodes Configuration
Configuring Memory Allocation
Formating and Running HDFS
Configuring YARN as a Job Scheduler
Running and Monitoring HDFS
Spark Cluster Installation and Configuration
Preparing your System for Spark Installation
Installing Spark on the Master Node
Installing Spark On the Slave Nodes
Integrating Spark with YARN
Running the Spark Cluster
Configuring the Memory Allocation
Running a Spark Application on top YARN Cluster
Monitoring Your Spark Applications
Running Massive Datasets on Spark and Hadoop Clusters
Configuring Jupyter Notebooks to access Spark and Hadoop Clusters
Storing Massive Datasets on HDFS
Using a Spark Cluster to processing Massive Datasets
IT professionals, Data Scientists and Big Data Engineers who are interested to setup Hadoop and Spark Clusters on the cloud, and run massive datasets on top of this infrastructure.
The participants who have successfully completed this course are encouraged to take our Innosoft Cloud Computing Certification (ICCP) Exam.