Hadoop overview and its EcoSystem
Hadoop overview and its EcoSystem
- Hadoop is an open source implementation of the Map-Reduce Platform and distributed file system, written in Java.
- Hadoop is actually a collection of tools, and an ecosystem built on top of the tools.
- The problem Hadoop solves is how to store and process big data. And when we need to store and process petabytes. Of information, the monolithic approach to computing no longer makes sense.
- When data is loaded into the system, it is split into blocks i.e typically 64 MB or 128MB.
- The first part of the Map-Reduce System is to work on relatively some portions of data in a single block.
- A master program allocates work to nodes such that a Map just will work on a block of data stored locally on that node whenever possible and many nodes work in parallel, each on their own part of the overall data set.
- Hadoop consists of two core components are
- The Hadoop distributed file system (HDRS)
- Map Reduce.
- There are many other projects based around core hardtop, often referred to as the Hadoop Ecosystem.
- The Hadoop Eco-Systems are a pig, Hive, HBase, Flume, Oozie, Sqoop, Zookeeper.
- A Set of Machines running HDFS and Map Reduce is known as a HADOOP Cluster.
- In Hadoop Cluster, Individual machines are known as Nodes and a cluster can have as few as one Node, as many as several thousand.
- If there are more nodes in a Hadoop Cluster, performance is better.
Hadoop Cluster
Hadoop is comprised of five separate daemons. They are
- Name Node: Holds the Metadata for HDFS.
- Secondary Name Node performs housekeeping functions for the Name Node and is not a backup or hot standby for the Name Node.
- Data Node: stores actual HDFS data blocks.
- Job Tracker: Manages Map Reduce jobs, distributes individual tasks to machines running.
- Task Tracker: Instantiates and Monitors individual
We can consider nodes to be in two different categories;
- Master Nodes: Run the Name Node, secondary Name Node, Job tracker daemons.
- Slave Nodes: Run the Data Node and Task Tracker daemons and a slave node will run both of these daemons..
Basic cluster configuration
On very small clusters, the Name Node, Job Tracker, and secondary Name Node can all reside on a single machine and it is typical to put them on separator Machines as the cluster grows beyond 20-30 Nodes.
Each dotted box on the previous diagram represents a separate Java Virtual Machine. (JVM)
About Hadoop Online Training @ BigClasses
BigClasses is one of the best online training organizations offer Hadoop training. We have qualified and experienced faculties who are responsible for taking the online sessions. We provide study materials and 24 hours support to our national and international learners as well. If you are interested in Hadoop online training, contact us for the detailed course and the free demo classes.
India: +91 800 811 4040 USA: +1 732 325 1626
Website:www.bigclasses.com Email:info@bigclasses.com