Unit 01 Lab 1: Minidoop

Part 1: Overview

This purpose of this lab is to give you an overview of the Hadoop environment we will use in this course, called Minidoop. Minidoop is the ultimate Big Data playground. It is an isolated single-user environment, making it a suitable environment for learning Hadoop through exploration and experimentation. Minidoop a full featured Hadoop install on a single virtual machine; reducing the complexity inherent in multi-node setups, and eliminating resource contention among other users. You have full administrative rights into Minidoop so you can truly make it your own.

About Minidoop

Here’s a quick video which explains the Minidoop environment in greater detail:

Minidoop is a network consisting of two virtual machines isolated on their own NAT:

Learning Outcomes

Upon completing this activity you will able to:

Requirements

To complete this activity you will need:

Before You Begin

Before you start this activity make sure to:

Part 2: Walk-Though

Let’s start this activity by walking you through some basic use cases for Minidoop.

How to Login

Logging in to the Hadoop Client

You will spend the majority of your time using the Minidoop Hadoop client virtual machine, hadoop-client. Switch to the console of this VM, where you should see the Ubuntu linux logon prompt: Logon Client

Logon as ischool with password SU2orange!. This should get you to the Unbuntu linux desktop.

NOTE: The Minidoop client is configured to login automatically with this account at startup, but you will need to re-enter the password after a period of inactivity.

NOTE: SU2orange! is also the root password, on both the hadoop-client and hadoop-cluster VM’s. The root account is the linux account with the highest level of access to the system and should only be used when you need to maniuplate system settings.

Let’s open up a terminal window from the hadoop-client by clicking on the terminal icon in the toolbar. Terminal Icon

You should see the following terminal window on your desktop: Terminal Window

Tangent: About the Linux Command Line Prompt

The linux command line prompt also known as just simply the command prompt or console is the part to the left of the cursor where you start typing in the terminal window. In this case, it’s ischool@dsappliance:~$. The command prompt provides you with three important pieces of information:

Logging in to the Hadoop cluster

There few reasons to logon directly to the Hadoop Cluster (hadoop-cluster) Virtual machine. On the rare occasion where you need to logon, here’s the procedure for logging on through the virtual machine console. (A second procedure for logging on remotely from the hadoop-client will be discussed in a future activity.)

Switch to the console of the hadoop-cluster VM, where you will see the Hortonworks Sandbox screen: Hadoop Cluster Console

Press ALT+F5 (or on a Mac CTRL+ALT+F5) to open the logon prompt.

From the Sandbox logon prompt logon as user root with password SU2orange!

NOTE: Ignore the on-screen instructions which tell you to logon as root / hadoop!

After you logon successfully, you will see the following linux command prompt: [root@sandbox ~]#. This prompt is structured a little differently from the previous one on our Hadoop client, but the same principles apply. The current user is root the hostname is sandbox and the current working folder is ~.

How to Logout

Logging out of the console or a terminal window is easy. Simply type exit from the Linux command prompt. Practice logging in and out of both the hadoop-client and hadoop-cluster until you feel comfortable with the process.

Troubleshooting the Minidoop network

Most of the time the Minidoop setup runs flawlessly. On the rare occasion that something isn’t right, it’s good to know how to troubleshoot basic network connectivity with your setup.

After you power on the virtual machines Minidoop should be ready to use. In rare circumstances the hadoop-cluster might not get the correct TCP/IP address due to the timing of when the virtual machines start.

To verify the Minidoop network is working properly:

  1. Open a linux command prompt on either the hadoop-client or the hadoop-cluster.
  2. Type this command: $ ping -c 4 hadoop-cluster to ping the cluster 4 times you should get 4 replies from sandbox (the host name) on TCP/IP address 192.168.10.11.
  3. Type: $ ping -c 4 hadoop-client to ping the hadoop-client Virtual Machine 4 times. You should get 4 replies from dsappliance (the host name) on TCP/IP address 192.168.10.12.

NOTE: You don’t type the $. It represents the command prompt itself.

You know your Minidoop setup is working properly because the ping statistics will report 4 packets sent, 4 received and 0% packet loss. Anything else indicated a problem.

Correcting Network Issues

To fix the network when it’s not working.

  1. Logon to hadoop-cluster as root
  2. From the command prompt type:
    $ ifconfig to view the network setup. The screenshot below has a correct setup for eth0:
    ifconfig output
    If the inet addr: says anything other than 192.168.10.11 then you must do the following
  3. ONLY DO THIS STEP IF YOUR IP ADDRESS ON YOUR Hadoop Cluster IS NOT 192.168.10.11:
    Remove the persistent-net.rules file which stores the network configuration for this virtual machine. Type:
    $ rm /etc/udev/rules.d/70-persistent-net.rules
    This will delete the file. After which you will need to reboot, type:
    $ reboot to reboot the Hadoop Cluster
  4. After Hadoop Cluster returns to a logon page, you can try the ping commands in the troubleshooting section once more.

Opening Multiple Command Prompts

At some point, you might need to open multiple command prompts from your hadoop-client. To open another terminal window from an existing command prompt, press CTRL+Shift+n.

Test Yourself

  1. How many Virtual Machines are part of the Minidoop setup? (Don’t include the router).
  2. Which linux account provides has the highest level of access to the system?
  3. For the following command prompt, identify the name of the login user, computer name, and current working directory: this@isa:test$
  4. What command do you type to logout?
  5. What is the IP Address of the Hadoop-Client virtual machine?
  6. What is the IP Address of the Hadoop-Cluster virtual machine?

Part 3: On Your own

Now that you have a basic understanding of Minidoop it is time to put your new found skills to practice.

Restart all virtual machines in your Minidoop setup.

Questions

Attempt the following steps, answering the questions where appropriate:

  1. Open a terminal window on hadoop-client. What the host name and current user?
  2. Logon to the hadoop-cluster using the account with highest level of access. What is the host name and current user?
  3. Test your network connectivity. Report the packet loss for both hadoop-cluster and hadoop-clent Fix your network if required. How do you know it is working?
  4. Logout of both command prompts.