Unit 02 Lab 1: Ambari

Part 1: Overview

Apache Ambari is a management platform for your Hadoop Cluster. It is used to provision, administer, secure and monitor your Hadoop cluster through an easy-to-use web interface. Here’s a brief marketing video which explains Ambari’s role in the Hadoop ecosystem:

NOTE: This lab is not a comprehensive tour of Ambari. It is intended to give you just enough knowledge so you can operate the Hadoop cluster you will use throughout this course.

Ambari can do so much more than we will need it to do, much of which is beyond the scope of this course. We can distill its use here down to two use cases:

  1. The Ambari dashboard will provide us with instant feedback into the health of our cluster. We’ll then take action to remedy any issues encountered. This feature will be used to verify the services we require in future labs are in good health.
  2. The Ambari views provide us with web-based access to the Hadoop client tools like Pig, Hive, Zeppelin and HDFS. In future labs we will access these tools through both the command line and the Ambari view.

Learning Outcomes

Upon completing this lab you will be able to:

Requirements

To complete this lab you will need:

Before You Begin

Before you start this lab you should:

Part 2: Walk-Though

Quick Tour of Ambari

To Start, let’s tour the Ambari Dashboard and feature set through this short video:

The Ambari Dashboard

After you login you will see the Ambari dashboard. There’s a lot going on here. Since this is a real-time dashboard, your screen will not match the screenshot exactly. Ambari Dashboard

The Main Menu

The top row of the Ambari dashboard is the main menu. Let’s explore the menu from left to right.

Services

On the left-hand side of the dashboard is a view of all the services in the cluster. Each service has 1 of 4 icons indicating its current status:

NOTE: Just because a service is red or green does not necessarily imply that it is useable or unusable. A misconfigured service could be running but not work properly, conversely a service could have a component down, but still operate effectively! This is why service checks exist, which we will cover in the next section.

As explained at the beginning of the lab, in this course you’ll spend most of your time with Ambari in to use cases. Both are covered in the next section.

Managing Services

Before you begin future labs you will be asked to verify the Hadoop services required by that lab working properly. This portion of the lab will explain, in detail how to do this.

Click on the HDFS service from the Dashboard. You should now see the Service Summary Page for the HDFS service.

NOTE: Don’t be concerned if your service does not match the screenshot. We will correct that shortly.

Service Summary

Service Summary

Service Operations Management

Let’s explore how Ambari performs operations on a service, by using HDFS as an example.

  1. From HDFS Summary page, click the Service Actions button and select Restart All to restart all services associated with HDFS. This will work whether the services are running or not currently.
  2. You will see a confirmation dialog.
    Restart HDFS Confirmation
    This is very important because shutting down HDFS means no clients will be able to access the cluster. This has major implications in a production environment, but in our sandbox, its a non-issue. At the Confirmation dialog, select Confirm Restart All.
  3. This will open the Background Operation Dialog we saw earlier. The difference now being there’s an actual running task. HDFS Backgroud Operation
    Let;s see what’s going on here. Click on the Restart all components for HDFS to drill down view the hosts involved in the operation.
  4. You will now see a list of hosts involved in the process of restarting HDFS. Included with the hosts are progress bars.
    HDFS Hosts
    Since more of the services are running on sandbox, click on sandbox.hortonworks.com to view the components being restarted on that host.
  5. This will drill down into a list of components on sandbox being restarted. Again the check boxes are updated in real-time as operations complete:
    HDFS components on Sandbox
    Let’s click on Restart NameNode to see the progress of that operation.
  6. You will now see actual output from the command! This is very useful when the service will not start due to errors as you will be able to access the actual error message. HDFS Namenode component
    Click OK to close the dialog.

NOTE: Closing the dialog does not cancel the operation. It is still running in the background. You can see this in the main menu:
Operation Runnning

What’s really useful about operations management is you can view the details of any operation after it completes by clicking on the name of the cluster (Sandbox) in the main menu and then drilling down through the operations as we did in the example.

Running Service Checks

As stated earlier, a running service does not necessarily imply a working service. This is the reason each service provides aservice check. In most cases the service check simply runs a program over the service to verify it functioning Let’s run a service check on the HDFS service.

  1. From the HDFS Service page, click on the Service Actions then select Run Service Check from the menu.
  2. A dialog will prompt for confirmation. Click OK to start the Service Check.
  3. A background opation will begin. Click down to the Check HDFS task dialog: Check HDFS
    Read through the script output. Observe the script is attempting to read and write to HDFS.
  4. When the Service Check passes, you will see a green check mark (depicted in the screenshot) Click OK to close the operations dialog.

NOTE: In future labs you will be asked to verify service XYZ is up and running before you start the lab. You have now been instructed how to accomplish this through service operations and service checks.

Service Versions

There are times you will need to know the version of the service running on your Hadoop cluster. Many of these applications are under heavy development and thus the feature set can vary substantially among versions.

For example, the version of Spark installed on your cluster is a couple versions behind the current release. This is important when you are searching online for documentation as you want your documentation to match your installed version.

Let’s walk through finding the installed version of the Spark service:

  1. From the Ambari main menu, select Admin then Stack and Versions A list of installed services will be displayed.
  2. Find Spark in the list of services. Note the version.

Ambari views

Ambari views provide access into the client-tooling through Ambari’s web interface. You can access the views from the Ambari main menu by clicking on the view icon View Icon

Notice when you click on the views menu there are options for HDFS, Hive, Pig, YARN Queues, and Tez among others. These represent web-client versions of these tools. In some cases the web versions of the client are advantageous over their command line counterparts.

We will use Ambari views on occasion in future labs where it makes sense to do so. It is important to node that when lab instructions say something like: logon to Ambari as admin and open the Hive View, this implies the following steps:

  1. Logon to http://sandbox:8080 as admin
  2. Click on the Ambari View icon View Icon
  3. Select Hive from the menu.

Test Yourself

  1. What on the is the main page of Ambari?
  2. Describe the process to view any running background operations from the dashboard.
  3. Explain how a service can be running (green) but still not work?
  4. How can you test a running service to ensure it is operating as expected?
  5. Why it is important to know which version of a Hadoop service you have installed?

Part 3: On Your Own

Exercises

Use the Ambari Interface to answer these questions and complete these Tasks. Be sure to include the steps you took in the Ambari UI to answer the question along with the actual answer itself.

  1. What are the three components of the YARN Service?
  2. What are the three components of the HDFS Service?
  3. What version of Pig is installed in your cluster?
  4. What version of Ambari is installed in your cluster?
  5. Shut down the MapReduce2 service using the Ambari interface. Does this raise an alert? Which one?
  6. Start the MapReduce2 service again. Drill down through the operations output. Are there any errors?
  7. Run a service check on Spark. Did it pass?