Sunday, 20 July 2014

What is Hadoop Ecosystem

Hadoop is an open source framework for storing and processing large amounts of data in distributed manner. It is a batch processing system and has 2 main components - Hadoop Distributed File System ( HDFS ) and MapReduce. HDFS is distributed storage  component while MapReduce is distributed computing. It is Apache foundation project with strong community. There are many real world big data challenges and hadoop core functionality was found to be wanting to face those challenges. There are many gaps in the way hadoop provided solution and a specific big data challenge. For eg - Hadoop is a batch processing system so if you want a real-time solution out of hadoop, it fails. This gap have led to springing up of another apache project, HBASE. HBASE is a column-oriented database that sits on top of hadoop framework and provide real-time capability to the framework.

Similarly, a lot of projects sprung up around core hadoop functionality and together they are called hadoop eco-system. So, each project in hadoop eco-system provide solution to a specific big data problem / challenge. Hadoop eco-system gets bigger and bigger as this technology is being used more and more. There are many projects in eco-system which are Apache foundation's while others belong to various vendors. There are some projects which are evolving pretty quickly due to its strong developer community while others are relatively new.

Lets have a closer look at hadoop eco-system projects in coming posts.

No comments:

Post a Comment