Hadoop is an open source
framework for storing and processing large amounts of data in distributed
manner. It is a batch processing system and has 2 main components - Hadoop Distributed
File System ( HDFS ) and MapReduce. HDFS is distributed storage component while MapReduce is distributed
computing. It is Apache foundation project with strong community. There are
many real world big data challenges and hadoop core functionality was found to
be wanting to face those challenges. There are many gaps in the way hadoop provided solution and a specific
big data challenge. For eg - Hadoop is a batch processing system so if you want
a real-time solution out of hadoop, it fails. This gap have led to springing up
of another apache project, HBASE. HBASE is a column-oriented database that sits
on top of hadoop framework and provide real-time capability to the framework.
Similarly, a lot of projects
sprung up around core hadoop functionality and together they are called hadoop
eco-system. So, each project in hadoop eco-system provide solution to a
specific big data problem / challenge. Hadoop eco-system gets bigger and bigger
as this technology is being used more and more. There are many projects in
eco-system which are Apache foundation's while others belong to various vendors.
There are some projects which are evolving pretty quickly due to its strong
developer community while others are relatively new.
Lets have a closer look
at hadoop eco-system projects in coming posts.
No comments:
Post a Comment