Hadoop and its Additional Technologies

Share on FacebookShare on LinkedInTweet about this on TwitterGoogle+

Apache Hadoop is an open source platform to develop distributed applications that can process huge amounts of data. It provides both the computational and distributed storage capabilities.

Computational Layer: Computational Layer uses a platform called MapReduce

Distributed Storage Layer: It uses a distributed file system that provides storage called HDFS

Here are some of the Hadoop and Additional Technologies

Hive: It is a distributed warehouse and manages the data stored in HDFS. It provides a SQL based query language for querying the data.

HBase: It is a column oriented distributed database uses HDFS for its primary storage and supports both the batch style communications using point queries and MapReduce.

Pig: Pig runs on HDFS and MapReduce clusters. It is an execution environment and data flow language to explore huge datasets.

R: R is a software and software programming environment for graphics and statistical computing. It is widely used among the data miners and statisticians for data analysis and for developing statistical software.

RHadoop: It allows to use R interfaces to the open source Hadoop distributed computing environment.

Cascading: Cascading is open source apache licensed software allows creating and executing the complex data processing workflows by using JVM based language.

Crunch: Apache Crunch is a Java Library for Testing, Running and Writing Hadoop MapReduce pipelines based on FlumeJava. It makes pipelines of many user defined functions easy to test, effective to run and simple to run.

RHipe: It is R and Hadoop integrated process platform to analyze data using Hadoop Tools within the R Environment.

ZooKeeper: It provides a distributed coordination service and primitives such as distributed locks used to build distributed applications.