Demo Video

Coming Soon

About Trainer

  • Trainer Information
  • Trainer Rahul has 12+ years of industry experience
  • Rahul is certified in Solaris, Vmware, Cloudera, Houston Networks
  • Trained more than 200 students
  • Handled 50+ corporate training events.
  • 8+ years of experience in Infracture and 4+ years of experies in Hadoop admin

Course Outline


1. Introduction to Hadoop

  • Enterprise Data Trends @ Scale
  • What is Big Data?
  • A Market for Big Data
  • Most Common New Types of Data
  • Moving from Causation to Correlation
  • What is Hadoop?
  • Traditional Systems vs. Hadoop
  • What is Hadoop 2.0?
  • Overview of a Hadoop Cluster
  • Different distributions of Hadoop
  • Hadoop Use Case
  • Lab exercise: – Login to Your Cluster

2. HDFS Architecture

  • What is a File System?
  • OS Architecture
  • HDFS Architecture
  • Understanding Block Storage
  • Demonstration: Understanding Block Storage
  • The NameNode
  • The Data Nodes
  • Data Node Failure
  • HDFS Clients

3. Installing Hadoop

  • Minimum Hardware Requirements
  • Minimum Software Requirements
  • A Formidable Starter Cluster
  • Lab exercise :- Setting up the Environment
  • Lab exercise :- Install HDP 2.0 Cluster using Ambari

4. Configuring Hadoop

  • Configuration Considerations
  • Deployment Layout
  • Configuring HDFS
  • What is Ambari
  • Configuration via Ambari
  • Management
  • Monitoring
  • REST API
  • Lab exercise :- Add a New Node to the Cluster
  • Lab exercise :- Stopping and Starting HDP Services
  • Lab exercise :- Using HDFS Commands

5. Ensuring Data Integrity

  • Replication Placement
  • Data Integrity – Writing Data
  • Data Integrity – Reading Data
  • Data Integrity – Block Scanning
  • Running a File System Check
  • What Does the File System Check Look For?
  • hadoop fsck Syntax
  • Data Integrity – File System Check: Commands & Output
  • Hadoop dfsadmin Command
  • NameNode Information
  • Changing the Replication Factor
  • Lab exercise :- Verify Data with Block Scanner and fsck

6. HDFS NFS Gateway

  • HDFS NFS Gateway Introduction
  • NFS Gateway Node
  • Configuring the HDFS NFS Gateway
  • Starting the NFS Gateway Service
  • User Authentication
  • Lab exercise: Mounting HDFS to a Local File System

7. YARN Architecture and MapReduce

  • What is YARN?
  • Hadoop as Next-Gen Platform
  • Beyond MapReduce
  • YARN Use Case
  • YARN Bird’s Eye View
  • Lifecycle of a YARN Application
  • Resource Manager
  • Node Manager
  • MapReduce
  • Understanding MapReduce
  • Configuring YARN
  • Configuring MapReduce tools
  • Lab exercise :- Troubleshooting a MapReduce Job

8. Job Schedulers

  • Overview of Job Scheduling
  • The Built-in Schedulers
  • Overview of the Capacity Scheduler
  • Configuring the Capacity Scheduler
  • Defining Queues
  • Configuring Capacity Limits
  • Configuring User Limits
  • Configuring Permissions
  • Overview of the Fair Scheduler
  • Multi-Tenancy Limits
  • Lab exercise: Configuring the Capacity Scheduler

9. Enterprise Data Movement

  • Enterprise Data Movement
  • Challenges with a Traditional ETL Platform
  • Hadoop Based ETL Platform
  • Data Ingestion
  • Hadoop: Reducing Business Latency
  • Defining Data Layers
  • Distributed Copy (distcp) Command
  • distcp Options
  • Considerations for distcp
  • Using distcp for Backups
  • Lab exercise : Use distcp to Copy Data from a Remote Cluster

10. HDFS Web Services

  • What is WebHDFS ?
  • Setting up WebHDFS
  • Using WebHDFS
  • WebHDFS Authentication
  • Copying Files to HDFS
  • Hadoop HDFS over HTTP
  • Who Uses WebHCat REST API?
  • Running WebHCat
  • Using WebHCat
  • Lab exercise : Using WebHDFS

11. Hive Administration

  • Introduction to Hive
  • Comparing Hive with RDBMS
  • Hive Components
  • Hive MetaStore
  • HiveServer2
  • Hive Command Line Interface
  • Processing Hive SQL Statements
  • Defining a Hive-Managed Table
  • Defining an External Table
  • Loading Data into Hive
  • Performing Queries
  • Guidelines for Architecting Hive Data
  • ORCFile Example
  • Hive Tables
  • Hive Query Optimizations
  • Hive/MR verses Hive/Tez
  • ORCFile Example
  • Compression
  • Hive Security
  • Lab exercise :- Understanding Hive Tables

12. Sqoop

  • Overview of Sqoop
  • The Sqoop Import Tool
  • Importing a Table
  • Importing Specific Columns
  • Importing from a Query
  • The Sqoop Export Tool
  • Exporting to a Table
  • Lab exercise :- Using Sqoop

13. Flume

  • Flume Introduction
  • Installing Flume
  • Flume Events
  • Flume Sources
  • Flume Channels
  • Flume Channel Selectors
  • Flume Channel Selector
  • Flume Sinks
  • Multiple Sinks
  • Flume Interceptors
  • Design Patters
  • Configuring Individual Components
  • Flume Netcat Source Example
  • Flume Exec Source Example
  • Flume Configuration
  • Monitoring Flume
  • Lab exercise :- Install and Test Flume

14. Oozie

  • Oozie Overview
  • Oozie Components
  • Jobs, Workflows, Coordinators, Bundles
  • Workflow Actions and Decisions
  • Oozie Job Submission
  • Oozie Server Workflow Coordinator
  • Oozie Console
  • Interfaces to Oozie
  • Oozie Server Configuration
  • Oozie Scripts
  • The Oozie CLI
  • Using the Oozie CLI
  • Submit Jobs through http
  • Oozie Actions
  • Oozie Metrics
  • Lab exercise: Running an Oozie Workflow

15. Monitoring Hadoop Services

  • Ambari
  • Monitoring Architecture
  • Monitoring HDP2 Clusters
  • Ambari Web Interfaces
  • Ambari Services – HDFS
  • Ganglia
  • Ganglia Monitoring a Hadoop Cluster
  • Nagios
  • Nagios – Ambari Interface
  • Nagios UI
  • Configuring Nagios
  • Monitoring JVM Processes
  • Understanding JVM Memory
  • Eclipse Memory Analyzer
  • JVM Memory Heap Dump
  • Java Management Extensions (JMX)

16. Commissioning Nodes

  • Architectural Review
  • Decommissioning and Commissioning Nodes
  • Decommissioning Worker Nodes
  • Steps for Decommissioning a Worker Node
  • Decommissioning Node States
  • Steps for Commissioning a Worker Node
  • Balancer
  • Balancer Threshold Setting
  • Configuring Balancer Bandwidth
  • Lab exercise :- Commissioning & Decommissioning Worker Nodes

17. Backup and Recovery

  • What should you backup?
  • HDFS Snapshots
  • HDFS Data – Backups
  • HDFS Data – Automate & Restore
  • Hive & Ambari Backup
  • Lab exercise :- Using HDFS Snapshots

18. Rack Awareness

  • Rack Awareness
  • YARN Rack Awareness
  • Replica Placement
  • Rack Topology
  • Rack Topology Script
  • Configuring the Rack Topology Script
  • Lab exercise: Configuring Rack Awareness

19. Name Node High Availability

  • NameNode Architecture HDP1
  • NameNode High Availability
  • HDFS HA Components
  • Understanding NameNode HA
  • NameNodes in HA
  • Failover Modes
  • NameNode Architectures
  • hdfs haadmin Command
  • Protecting Metadata Repositories
  • Red Hat HA
  • VMware HA
  • Lab exercise :- Configure NameNode High Availability using Ambari

20. Security in Hadoop

  • Security Concepts
  • Kerberos Synopsis
  • HDP Security Overview
  • Securing HDP – Authentication
  • Securing HDP – Authorization
  • Lab exercise :- Securing a HDP Cluster

Resources

Coming Soon