There are a number of distributions of
Hadoop. A comprehensive list can be found at hadoop wiki page. We will be examining three of them:
• Cloudera
Distribution of Hadoop (CDH)
• Hortonworks Data
Platform (HDP)
• MapR
Cloudera
Cloudera was founded by big data
geniuses from Facebook, Google, Oracle and Yahoo in 2008. It was the first
company to develop the distribute Apache Hadoop-based software. And it has the
largest user base with most number of clients. Although the core of the
distribution is based on Apache Hadoop open source communities, it also
provides a proprietary Cloudera Management Suite to automate the installation,
configuration process and provide other services to enhance convenience of
users which include reducing deployment time, displaying real time nodes’
count, etc. CDH
is in its fifth major version right now and is considered a mature Hadoop
distribution. The paid version of CDH comes with a proprietary management software,
Cloudera Manager.
Hortonworks Data Platform (HDP)
Hortonworks, founded in 2011, has
quickly emerged as one of the leading vendors of Hadoop. The distribution
provides open source platform based on Apache Hadoop for analyzing, storing and
managing big data. Hortonworks is the only commercial vendor to distribute
complete open source Apache Hadoop without additional proprietary software.
Hortonworks distribution HDP2.5 can be directly downloaded from their website
free of cost and is easy to install. The engineers of Hortonworks are behind
most of Hadoop’s recent innovations including Yarn, which is better than
MapReduce in the sense that it will enable inclusion of more data processing
frameworks. HDP is in its second major version currently and is considered the
rising star in Hadoop distributions. It comes with a free and open source management
software called Ambari.
MAPR
In its standard, open
source edition, Apache Hadoop software comes with a number of restrictions.
Vendor distributions are aimed at overcoming the issues that the users
typically encounter in the standard editions. Under the free Apache license,
all the three distributions provide the users with the updates on core Hadoop
software. But when it comes to handpicking any one of them, one should look at
the additional value it is providing to the customers in terms of improving the
reliability of the system (detecting and fixing bugs etc), providing technical
assistance and expanding functionalities.
MapR comes with its
own management console. The different grades of the product are named as M3,
M5, and M7. M5 is a standard commercial distribution from the company, M3 is a
free version without high availability, and M7 is a paid version with a
rewritten HBase API.
No comments:
Post a Comment