becustom
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114wordpress-seo
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114Apache Ambari is an open source project and its main purpose is to install or deploy, manage and monitor Hadoop clusters. In this post we will see what is the Apache Ambari and how it is different from Cloudera Manager, we will also see the high level architecture of Ambari and we will we provision or in simple terms deploy or install a Hadoop cluster.<\/span><\/p>\n Cloudera Manager is a cluster management software for Cloudera Distribution. So why do we need another cluster management software when we already have cloudera manager? Cloudera Manager is a proprietary software from Cloudera and it is used to manage Hadoop cluster for Cloudera Distribution including Apache Hadoop or in short CDH clusters. Apache Ambari is an open source project and Hortonworks which is another major Hadoop vendor has adopted Apache Ambari as the tool of choice to provision, manage and monitor clusters for it\u2019s Hadoop distribution, Hortonworks Data Platform, HDP for short.<\/span><\/p>\n If you like to explore more about Apache Ambari and other Hadoop administration concepts,\u00a0check out our\u00a0Hadoop Administrator In Real World<\/a>\u00a0course.<\/em><\/p>\n One of the common question we get from students and our community is that – can we use cloudera manager to manager a HDP cluster or can we use Ambari to manage a CDH cluster?<\/strong> Both are not possible. Even though both CDH and HDP are derived from the same Hadoop open source project. Cloudera and hortonworks have done several changes to their libraries and both cloudera manager and ambari are designed to work with it\u2019s respective platforms. So for this reason we can’t use ambari to manage a CDH cluster and also we can’t use cloudera manager to manage a HDP cluster.<\/span><\/p>\n Let’s look at the prerequisites and the nodes setups before we look at the installation and configuration of Hadoop cluster. Before all that, lets look at Ambari’s architechture.<\/p>\n The architecture of Ambari is very much similar to the architecture of Cloudera Manager. We will have Ambari server installed usually on it’s own dedicated node and then ambari agents running on every single node in our Hadoop cluster. Ambari server will then communicate or push commands to ambari agent also the ambari agents will send out heart beats to Ambari server. This way Ambari server will have a good view of the cluster and will also be able to control the services on each node.<\/span><\/p>\n <\/a><\/p>\n Let\u2019s look at the prerequisites to run Ambari. Here are the supported operating systems. We have been using Ubuntu so far in our course, so we will continue to use Ubuntu and we will go with the latest supported version ubuntu trusty 14.04. Next, let’s look at the software requirements. All softwares listed are pretty straightforward so we don\u2019t have to worry too much. Next important requirement is the JDK. Ambari supports JDK 1.7 and 1.8. We will go with JDK 1.8. We need to make sure the release we try to install is over 1.8_40. <\/span><\/p>\n <\/a><\/p>\n Here is our game plan. We will launch 4 ec2 ubuntu instances in AWS. In one of the instances we will run Ambari Server and the other 3 nodes or instances will be our Hadoop cluster. If you are new to AWS, follow this post on creating instances on EC2 and how to prepare them for a Hadoop installation<\/a>.<\/span><\/p>\n We are now ready to install ambari-server on node 1. \u00a0To install ambari-server we will run the command apt-get install ambari-server<\/strong>. Now that ambari-server is installed we need to run ambari-server setup to perform couple of setups that is needed to run ambari properly.\u00a0<\/span><\/p>\n <\/a><\/p>\n So let\u2019s execute. ambari-server setup. The very first thing the setup will do is to download and install Java Cryptography Extension (JCE) Policy Files; this is needed when we decide to configure components like kerberos in our cluster. Here we will select JDK 8 because we have installed JDK 8 in our environment. We need to accept the license next, let\u2019s enter yes.<\/span><\/p>\n <\/a><\/p>\n For Ambari to work properly, it needs to store the information like configuration of the cluster, information about the nodes in the cluster etc. in a reliable database. Ambari by default store such information in a PostgreSQL database. We are not going to perform any advanced configuration, we are going to stick with the defaults. So we say no to this prompt.<\/span><\/p>\n Setup is now complete. Now we are ready to start Ambari. Let\u2019s run ambari-server start<\/strong>. Ambari is now started. We can check the status of Ambari by running ambari-server status<\/strong> command<\/p>\n <\/a><\/p>\n We can get to Ambari user interface on port 8080 on the node where it is installed. By default the username and password is admin and admin. Don\u2019t forget to change the password later to something strong. Here is the ambari home page.<\/span><\/p>\n <\/a><\/p>\n We have 3 main options on the home screen, to deploy a cluster, which is what we will do next, to manage users and groups and deploy views. Think of views as a pluggable UI component. Let\u2019s say for eg. as an administrator you would like to get a better visualization of the jobs running in your cluster, of course you can use the Resource Manager\u2019s web user interface. But it only give you a big list of jobs and you don\u2019t like the fact that it doesn’t provide a good summary of the current state of the jobs in your cluster. In that case, you can create a small web application using Hadoop API and create a visual representation of the running currently running and the ones that failed and succeeded. You can customize the view as per your need. You need to code the application of course to create a view from scratch. \u00a0But Ambari already comes with few views to help us perform day to day admin tasks. You will most likely don\u2019t see a need to create views but if you do have a need to customize something it is good to know that Ambari offers a way for you to do so.<\/span><\/p>\n Alright. Let\u2019s click \u201claunch install wizard\u201d to set up our Hadoop cluster. We are going to set up a 3 node Hadoop cluster with all key services like HDFS, YARN, Pig, Hive etc.<\/span><\/p>\n We need to provide a name to the cluster. I am going to name it HIRW_CLUSTER. Next we need to select the version of Hortonworks Data Platform, we are going to select HDP 2.2. <\/span><\/p>\n <\/a><\/p>\n <\/a><\/p>\n Next, we need to provide the list of nodes that will be part of our Hadoop cluster. We are going to give the private DNS of the EC2 instances that we launched in AWS here. A<\/span>mbari server needs the private key to login to these nodes. So we will select the .pem file which will serve as the key to login to these nodes, we should also enter the username that corresponds to the key. The username associated with key is ubuntu so we enter that on this screen. Click Register and Confirm, if the hosts are reachable and key is good a<\/span>ll our 3 hosts will be confirmed and this would mean Ambari can reach the nodes and able to login as well.<\/span><\/p>\n Next screen let\u2019s us to select the services we need to install in our cluster, we will select HDFS, YARN, pig and Hive. we are not planning to install other services like HBASE, Oozie at this time.<\/span><\/p>\n <\/a><\/p>\n Based on our selections Ambari will list other dependant services that is needed to be included in our cluster. \u00a0In the next screen, we need to select the nodes where core services like namenode, resourcemanager, secondary namenode, zookeeper etc. will run. We are\u00a0going to make namenode and resourcemanager run on the same node, typically in a production cluster, namenode and resourcemanager will run on separate dedicated nodes. Zookeeper will run on all 3 nodes. Secondary namenode will run on node 2. Along with services for Hive.<\/span><\/p>\n <\/a><\/p>\n In the next screen, we will designate where data nodes and node managers will run. We will run data nodes and node managers on all nodes, we will also install clients on all nodes. This would mean we can run HDFS, YARN, pig and Hive commands from any nodes since we have installed clients on all nodes. NFSGateway allows HDFS to be mounted as a drive so you can browse HDFS as if you are browsing a local file system. We will install this on all nodes as well.<\/span><\/p>\n <\/a><\/p>\n Click next, in the next screen we will have the option to change any configuration values. We will go ahead with the default configuration properties.\u00a0<\/span>We see 2 errors which we need to take care of one in Hive and the other one in Ambari metrics. We need to give the password for the hive database so we can login to the database if needed. On the Ambari metrics page we need to provide password for Grafana. Grafana is an opensource tool to build beautiful and powerful dashboards, it is great to plot time series data which mean it is great to visualize our cluster for monitoring purposes. It is a basically a charting application. Ambari includes Grafana. This password we are inputting here is for Grafana\u2019s dashboard which can be used to build charts as needed. So give a password for both Hive and Amabari <\/span>Metrics page.<\/span><\/p>\n <\/a><\/p>\n We are almost done here, In the next screen we will review all our selections we have made so far. Everything looks good here, let\u2019s click deploy.\u00a0<\/span>We can just sit back and monitor the progress of installation, Ambari will take care of installing all the services which we have selected and it will happen parallely across all nodes. So let\u2019s wait for the installation to complete.<\/span><\/p>\n <\/a> <\/a><\/p>\n <\/a><\/p>\n There you go. Installation is now complete. And it is successful. Let\u2019s click next to see the summary and finally hit complete.\u00a0<\/span>Here is the cluster home page now with good looking charts. Everything is green which confirms our cluster is in good shape. \u00a0<\/span><\/p>\nPrepare for installation<\/h2>\n
Architecture<\/h3>\n
Prerequisites<\/h3>\n
EC2 Instances<\/span><\/h3>\n
Install & configure Hadoop cluster with Ambari<\/h2>\n
Hosts<\/h3>\n
Select and configure services<\/h3>\n