Wednesday 29 March 2017

Main contributors of Hadoop

There are a number of distributions of Hadoop. A comprehensive list can be found at hadoop wiki page. We will be examining three of them:

• Cloudera Distribution of Hadoop (CDH)
• Hortonworks Data Platform (HDP)
• MapR

Cloudera


Cloudera was founded by big data geniuses from Facebook, Google, Oracle and Yahoo in 2008. It was the first company to develop the distribute Apache Hadoop-based software. And it has the largest user base with most number of clients. Although the core of the distribution is based on Apache Hadoop open source communities, it also provides a proprietary Cloudera Management Suite to automate the installation, configuration process and provide other services to enhance convenience of users which include reducing deployment time, displaying real time nodes’ count, etc. CDH is in its fifth major version right now and is considered a mature Hadoop distribution. The paid version of CDH comes with a proprietary management software, Cloudera Manager.

Hortonworks Data Platform (HDP)


Hortonworks, founded in 2011, has quickly emerged as one of the leading vendors of Hadoop. The distribution provides open source platform based on Apache Hadoop for analyzing, storing and managing big data. Hortonworks is the only commercial vendor to distribute complete open source Apache Hadoop without additional proprietary software. Hortonworks distribution HDP2.5 can be directly downloaded from their website free of cost and is easy to install. The engineers of Hortonworks are behind most of Hadoop’s recent innovations including Yarn, which is better than MapReduce in the sense that it will enable inclusion of more data processing frameworks. HDP is in its second major version currently and is considered the rising star in Hadoop distributions. It comes with a free and open source management software called Ambari.

MAPR


In its standard, open source edition, Apache Hadoop software comes with a number of restrictions. Vendor distributions are aimed at overcoming the issues that the users typically encounter in the standard editions. Under the free Apache license, all the three distributions provide the users with the updates on core Hadoop software. But when it comes to handpicking any one of them, one should look at the additional value it is providing to the customers in terms of improving the reliability of the system (detecting and fixing bugs etc), providing technical assistance and expanding functionalities.


MapR comes with its own management console. The different grades of the product are named as M3, M5, and M7. M5 is a standard commercial distribution from the company, M3 is a free version without high availability, and M7 is a paid version with a rewritten HBase API.

No comments:

Post a Comment