Hadoop Training Course Content and Syllabus
Hadoop Course Content
• Hadoop Overview, Architecture Considerations, Infrastructure, Platforms and Automation
Use case walkthrough
• ETL
• Log Analytics
• Real Time Analytics
Hbase for Developers :
NoSQL Introduction
• Traditional RDBMS approach
• NoSQL introduction
• Hadoop & Hbase positioning
Hbase Introduction
• What it is, what it is not, its history and common use-cases
• Hbase Client � Shell, exercise
Hbase Architecture
• Building Components
• Storage, B+ tree, Log Structured Merge Trees
• Region Lifecycle
• Read/Write Path
Hbase Schema Design
• Introduction to hbase schema
• Column Family, Rows, Cells, Cell timestamp
• Deletes
• Exercise - build a schema, load data, query data
Hbase Java API � Exercises
• Connection
• CRUD API
• Scan API
• Filters
• Counters
• Hbase MapReduce
• Hbase Bulk load
Hbase Operations, cluster management
• Performance Tuning
• Advanced Features
• Exercise
• Recap and Q&A
MapReduce for Developers
Introduction
• Traditional Systems / Why Big Data / Why Hadoop
• Hadoop Basic Concepts/Fundamentals
Hadoop in the Enterprise
• Where Hadoop Fits in the Enterprise
• Review Use Cases
Architecture
• Hadoop Architecture & Building Blocks
• HDFS and MapReduce
Hadoop CLI
• Walkthrough
• Exercise
MapReduce Programming
• Fundamentals
• Anatomy of MapReduce Job Run
• Job Monitoring, Scheduling
• Sample Code Walk Through
• Hadoop API Walk Through
• Exercise
MapReduce Formats
• Input Formats, Exercise
• Output Formats, Exercise
Hadoop File Formats
MapReduce Design Considerations
MapReduce Algorithms
• Walkthrough of 2-3 Algorithms
MapReduce Features
• Counters, Exercise
• Map Side Join, Exercise
• Reduce Side Join, Exercise
• Sorting, Exercise
Use Case A (Long Exercise)
• Input Formats, Exercise
• Output Formats, Exercise
MapReduce Testing
Hadoop Ecosystem
• Oozie
• Flume
• Sqoop
• Exercise 1 (Sqoop)
• Streaming API
• Exercise 2 (Streaming API)
• Hcatalog
• Zookeeper
HBase Introduction
• Introduction
• HBase Architecture
MapReduce Performance Tuning
Development Best Practice and Debugging
Apache Hadoop for Administrators
Hadoop Fundamentals and Architecture
• Why Hadoop, Hadoop Basics and Hadoop Architecture
• HDFS and Map Reduce
Hadoop Ecosystems Overview
• Hive
• Hbase
• ZooKeeper
• Pig
• Mahout
• Flume
• Sqoop
• Oozie
Hardware and Software requirements
• Hardware, Operating System and Other Software
• Management Console
Deploy Hadoop ecosystem services
• Hive
• ZooKeeper
• HBase
• Administration
• Pig
• Mahout
• Mysql
• Setup Security
Enable Security Configure Users, Groups, Secure HDFS, MapReduce, HBase and Hive
• Configuring User and Groups
• Configuring Secure HDFS
• Configuring Secure MapReduce
• Configuring Secure HBase and Hive
Manage and Monitor your cluster
Command Line Interface
Troubleshooting your cluster
Introduction to Big Data and Hadoop
Hadoop Overview
• Why Hadoop
• Hadoop Basic Concepts
• Hadoop Ecosystem MapReduce, Hadoop Streaming, Hive, Pig, Flume, Sqoop, Hbase, Oozie, Mahout
• Where Hadoop fits in the Enterprise
• Review use cases
Apache Hive & Pig for Developers
Overview of Hadoop
• Big Data and the Distributed File System
• MapReduce
Hive Introduction
• Why Hive?
• Compare vs SQL
• Use Cases
Hive Architecture Building Blocks
• Hive CLI and Language (Exercise)
• HDFS Shell
• Hive CLI
• Data Types
• Hive Cheat-Sheet
• Data Definition Statements
• Data Manipulation Statements
• Select, Views, GroupBy, SortBy/DistributeBy/ClusterBy/OrderBy, Joins
• Built-in Functions
• Union, Sub Queries, Sampling, Explain
Hive Usecase implementation - (Exercise)
• Use Case 1
• Use Case 2
• Best Practices
Advance Features
• Transform and Map-Reduce Scripts
• Custom UDF
• UDTF
• SerDe
• Recap and Q&A
Pig Introduction
• Position Pig in Hadoop ecosystem
• Why Pig and not MapReduce
• Simple example (slides) comparing Pig and MapReduce
• Who is using Pig now and what are the main use cases
• Pig Architecture
• Discuss high level components of Pig
• Pig Grunt - How to Start and Use
Pig Latin Programming
• Data Types
• Cheat sheet
• Schema
• Expressions
• Commands and Exercise
• Load, Store, Dump, Relational Operations,Foreach, Filter, Group, Order By, Distinct, Join, Cogroup,Union, Cross, Limit, Sample, Parallel
Use Cases (working exercise)
• Use Case 1
• Use Case 2
• Use Case 3 (compare pig and hive)
Advanced Features, UDFs
Best Practices and common pitfalls
Mahout & Machine Learning
• Mahout Overview
• Mahout Installation
• Introduction to the Math Library
• Vector implementation and Operations (Hands-on exercise)
• Matrix Implementation and Operations (Hands-on exercise)
• Anatomy of a Machine Learning Application
Classification
• Introduction to Classification
• Classification Workflow
• Feature Extraction
• Classification Techniques (Hands-on exercise)
Evaluation (Hands-on exercise)
• Clustering
• Use Cases
• Clustering algorithms in Mahout
• K-means clustering (Hands-on exercise)
• Canopy clustering (Hands-on exercise)
Clustering
• Mixture Models
• Probabilistic Clustering � Dirichlet (Hands-on exercise)
• Latent Dirichlet Model (Hands-on exercise)
• Evaluating and Improving Clustering quality (Hands-on exercise)
• Distance Measures (Hands-on exercise)
Recommendation Systems
• Overview of Recommendation Systems
• Use cases
• Types of Recommendation Systems
• Collaborative Filtering (Hands-on exercise)
• Recommendation System Evaluation (Hands-on exercise)
• Similarity Measures
• Architecture of Recommendation Systems
• Wrap Up
Hadoop training duration in Gurgaon
Regular Classes( Morning, Day time & Evening)
• Duration : 6 weeks
Weekend Training Classes( Saturday, Sunday & Holidays)
• Duration : 10 Weeks
Fast Track Training Program( 6+ hours classes daily)
• Duration : within 6 weeks
Hadoop trainer Profile & Placement
Our Hadoop Trainers
• More than 4 Years of experience in Hadoop Technologies
• Has worked on 4 realtime Hadoop projects
• Working in a MNC company in Gurgaon
• Trained 1903+ Students so far.
• Strong Theoretical & Practical Knowledge
• Hadoop certified Professionals
Hadoop (hadoop administration) Placement Training in Gurgaon
• More than 1903+ students Trained
• 1499 students Placed
• 892 Interviews Organized
• Placement Supported by InterviewDesk.com
• Hadoop Overview, Architecture Considerations, Infrastructure, Platforms and Automation
Use case walkthrough
• ETL
• Log Analytics
• Real Time Analytics
Hbase for Developers :
NoSQL Introduction
• Traditional RDBMS approach
• NoSQL introduction
• Hadoop & Hbase positioning
Hbase Introduction
• What it is, what it is not, its history and common use-cases
• Hbase Client � Shell, exercise
Hbase Architecture
• Building Components
• Storage, B+ tree, Log Structured Merge Trees
• Region Lifecycle
• Read/Write Path
Hbase Schema Design
• Introduction to hbase schema
• Column Family, Rows, Cells, Cell timestamp
• Deletes
• Exercise - build a schema, load data, query data
Hbase Java API � Exercises
• Connection
• CRUD API
• Scan API
• Filters
• Counters
• Hbase MapReduce
• Hbase Bulk load
Hbase Operations, cluster management
• Performance Tuning
• Advanced Features
• Exercise
• Recap and Q&A
MapReduce for Developers
Introduction
• Traditional Systems / Why Big Data / Why Hadoop
• Hadoop Basic Concepts/Fundamentals
Hadoop in the Enterprise
• Where Hadoop Fits in the Enterprise
• Review Use Cases
Architecture
• Hadoop Architecture & Building Blocks
• HDFS and MapReduce
Hadoop CLI
• Walkthrough
• Exercise
MapReduce Programming
• Fundamentals
• Anatomy of MapReduce Job Run
• Job Monitoring, Scheduling
• Sample Code Walk Through
• Hadoop API Walk Through
• Exercise
MapReduce Formats
• Input Formats, Exercise
• Output Formats, Exercise
Hadoop File Formats
MapReduce Design Considerations
MapReduce Algorithms
• Walkthrough of 2-3 Algorithms
MapReduce Features
• Counters, Exercise
• Map Side Join, Exercise
• Reduce Side Join, Exercise
• Sorting, Exercise
Use Case A (Long Exercise)
• Input Formats, Exercise
• Output Formats, Exercise
MapReduce Testing
Hadoop Ecosystem
• Oozie
• Flume
• Sqoop
• Exercise 1 (Sqoop)
• Streaming API
• Exercise 2 (Streaming API)
• Hcatalog
• Zookeeper
HBase Introduction
• Introduction
• HBase Architecture
MapReduce Performance Tuning
Development Best Practice and Debugging
Apache Hadoop for Administrators
Hadoop Fundamentals and Architecture
• Why Hadoop, Hadoop Basics and Hadoop Architecture
• HDFS and Map Reduce
Hadoop Ecosystems Overview
• Hive
• Hbase
• ZooKeeper
• Pig
• Mahout
• Flume
• Sqoop
• Oozie
Hardware and Software requirements
• Hardware, Operating System and Other Software
• Management Console
Deploy Hadoop ecosystem services
• Hive
• ZooKeeper
• HBase
• Administration
• Pig
• Mahout
• Mysql
• Setup Security
Enable Security Configure Users, Groups, Secure HDFS, MapReduce, HBase and Hive
• Configuring User and Groups
• Configuring Secure HDFS
• Configuring Secure MapReduce
• Configuring Secure HBase and Hive
Manage and Monitor your cluster
Command Line Interface
Troubleshooting your cluster
Introduction to Big Data and Hadoop
Hadoop Overview
• Why Hadoop
• Hadoop Basic Concepts
• Hadoop Ecosystem MapReduce, Hadoop Streaming, Hive, Pig, Flume, Sqoop, Hbase, Oozie, Mahout
• Where Hadoop fits in the Enterprise
• Review use cases
Apache Hive & Pig for Developers
Overview of Hadoop
• Big Data and the Distributed File System
• MapReduce
Hive Introduction
• Why Hive?
• Compare vs SQL
• Use Cases
Hive Architecture Building Blocks
• Hive CLI and Language (Exercise)
• HDFS Shell
• Hive CLI
• Data Types
• Hive Cheat-Sheet
• Data Definition Statements
• Data Manipulation Statements
• Select, Views, GroupBy, SortBy/DistributeBy/ClusterBy/OrderBy, Joins
• Built-in Functions
• Union, Sub Queries, Sampling, Explain
Hive Usecase implementation - (Exercise)
• Use Case 1
• Use Case 2
• Best Practices
Advance Features
• Transform and Map-Reduce Scripts
• Custom UDF
• UDTF
• SerDe
• Recap and Q&A
Pig Introduction
• Position Pig in Hadoop ecosystem
• Why Pig and not MapReduce
• Simple example (slides) comparing Pig and MapReduce
• Who is using Pig now and what are the main use cases
• Pig Architecture
• Discuss high level components of Pig
• Pig Grunt - How to Start and Use
Pig Latin Programming
• Data Types
• Cheat sheet
• Schema
• Expressions
• Commands and Exercise
• Load, Store, Dump, Relational Operations,Foreach, Filter, Group, Order By, Distinct, Join, Cogroup,Union, Cross, Limit, Sample, Parallel
Use Cases (working exercise)
• Use Case 1
• Use Case 2
• Use Case 3 (compare pig and hive)
Advanced Features, UDFs
Best Practices and common pitfalls
Mahout & Machine Learning
• Mahout Overview
• Mahout Installation
• Introduction to the Math Library
• Vector implementation and Operations (Hands-on exercise)
• Matrix Implementation and Operations (Hands-on exercise)
• Anatomy of a Machine Learning Application
Classification
• Introduction to Classification
• Classification Workflow
• Feature Extraction
• Classification Techniques (Hands-on exercise)
Evaluation (Hands-on exercise)
• Clustering
• Use Cases
• Clustering algorithms in Mahout
• K-means clustering (Hands-on exercise)
• Canopy clustering (Hands-on exercise)
Clustering
• Mixture Models
• Probabilistic Clustering � Dirichlet (Hands-on exercise)
• Latent Dirichlet Model (Hands-on exercise)
• Evaluating and Improving Clustering quality (Hands-on exercise)
• Distance Measures (Hands-on exercise)
Recommendation Systems
• Overview of Recommendation Systems
• Use cases
• Types of Recommendation Systems
• Collaborative Filtering (Hands-on exercise)
• Recommendation System Evaluation (Hands-on exercise)
• Similarity Measures
• Architecture of Recommendation Systems
• Wrap Up
Hadoop training duration in Gurgaon
Regular Classes( Morning, Day time & Evening)
• Duration : 6 weeks
Weekend Training Classes( Saturday, Sunday & Holidays)
• Duration : 10 Weeks
Fast Track Training Program( 6+ hours classes daily)
• Duration : within 6 weeks
Hadoop trainer Profile & Placement
Our Hadoop Trainers
• More than 4 Years of experience in Hadoop Technologies
• Has worked on 4 realtime Hadoop projects
• Working in a MNC company in Gurgaon
• Trained 1903+ Students so far.
• Strong Theoretical & Practical Knowledge
• Hadoop certified Professionals
Hadoop (hadoop administration) Placement Training in Gurgaon
• More than 1903+ students Trained
• 1499 students Placed
• 892 Interviews Organized
• Placement Supported by InterviewDesk.com