Hadoop Programming on the Hortonworks Data Platform

Lab 1 – Learning the Lab Environment

The steps below will help you understand the Lab Environment for this class as well as list some of the repeated tasks that you will be required to do in subsequent labs. You may also, if necessary, refresh your memory on some of the commands and tools commonly used in Unix-like environments.

Part 1 – Connecting to the Lab Environment

There are a number of ways your class lab environment can be set up. Below is an overview of steps you need to take to be able to start doing your class labs. Those steps are in addition to the steps you are required to perform in order to connect to the cloudbased lab environment. Those steps are sent to you in a separate communication from the class support specialist and would normally contain an RDP connection file as an attachment as shown below (the file may have a different name or extension).

The RDP connection will allow you to connect to the Windows Gateway Server from which you will need to connect to your dedicated Unix-based Lab Server using the PuTTY SSH client. In some cases, we will be referring to PuTTY simply as an SSH client.

Note 1: You will be provided the IP address of your dedicated Lab Server either by the class support specialist via email in advance or by the instructor while in the class.

Note 2: If you don’t have a shortcut to PuTTY, create one by pointing to PUTTY.EXE located in c:\Software\PuTTY folder.

__1. Double-click the PuTTY shortcut on the desktop of your Windows Gateway Server.

__2. In the PuTTY dialog window that opens, make sure that the Connection type is set to SSH and the port is 22

__3. In the Host Name (or IP address) input box, enter the IP address of your Unix Lab Server.

__4. Click the Open button at the bottom of the PuTTY dialog window.

If you are presented with security warnings or other such alerts requiring your actions, confirm that your trust the target server and accept the responsibilities of doing so by clicking Yes, or OK, as appropriate. For example, if you see the dialog below, click the Yes button.

__5. Enter root for logon id and $123_doop for password, when prompted.

You should be able to connect to the Lab Server and placed at its command prompt. Remember that you are logged in as root so use extra care when running commands under this account.

By default, a PuTTY session allocates only 200 scrollback lines in its terminal; we may need more in our labs.

__6. In the PuTTY terminal window, open the system menu by clicking the top left-hand icon and select Change Settings…

__7. In the PuTTY Reconfiguration dialog that opens, locate the Window | Lines of scrollback text box which should have the default 200 value.

__8. Change the 200 value to 9999 and click Apply.

Note: This configuration setting will only hold for the duration of the current PuTTY session and you will need to reset it should you start a new session. You can use some simple steps to persist custom configuration settings across all sessions bound to the target IP address, which we don’t review here.

Part 2 – Restarting Session

If your connection to the Lab Server is lost, try reconnect using the PuTTY’s system menu by selecting Restart Session. When prompted, provide your login credentials.

Part 3 – Update the hosts file on the Windows Gateway Server

In some labs, you will be required to use your browser to view content served by embedded web servers of various Hadoop ecosystem products. In many cases, using only the target HTTP server’s IP address / port is not sufficient – you will need to use the host name of your Lab Server, which you can achieve by adding the Lab Server’s host name to the hosts file on the Windows Gateway Server. This file can usually be found in the C:\Windows\System32\drivers\etc directory.

When updating the hosts file, you need to run your text editor as Administrator:

__1. Open Notepad as Administrator by right-clicking the Notepad link and selecting the Run as administrator context menu option (your view of accessing Notepad can be different from the screenshot shown below).

If and when presented by a security warning dialog, click Yes or OK to acknowledge your intent.

__2. Open the hosts file; the path to the file depends on the version of your Windows Gateway Server, and should be something like this one:

C:\Windows\System32\drivers\etc\hosts

__3. Enter the following line at the bottom of the file, use the IP address of you dedicated Lab Server (we use 192.168.89.133 – yours will be different); use a tab as a separator between the IP address of the target host and the host name of your Lab Server (sandbox.hortonworks.com), which should be the same for all Lab Servers in your class.

192.168.89.133 sandbox.hortonworks.com

__4. Save the file and close Notepad.

Part 4 – Using the vi Editor

Throughout subsequent labs, you may be required to create new files or edit existing ones. You can use any editor that is installed on the Lab Server and you are familiar with. Below we provide a quick overview of the vi editor. To editing or create a new file, pass on the file name as an argument to vi.

__1. Enter the following command to create a new file foobar:

vi foobar

The vi editor opens displaying the contents of the existing file (that you want to edit) or an empty window for the new file as is the case with the foobar file. The vi editor supports two modes: edit mode and command mode. By default, you are in command mode. To switch to edit (insert) mode, you need to press i on the keyboard.

__2. Press i on the keyboard.

You should see the — INSERT — line appearing at the bottom of the editor window. At this point you can start typing your text. To switch back to command mode, you need to press Esc followed by : The colon (:) that appears at the bottom of the screen is the command prompt ready to accept your commands. We will use just the save and exit commands.

Whenever want to save your file without leaving the editor, you need to switch to command node and type w at the : command prompt. Make sure you are in the insert (editing) mode (you should see the — INSERT — line at the editor bottom).

__3. Type in some text in the editor window.

__4. Press Esc followed by :

You should see the command prompt (:) appearing at the bottom of the screen:

__5. Press w at the : command prompt and press Enter.

This command will save changes to your file and will create it if it is a new file. If you want to save and exit the editor, you need to enter wq at the command prompt (:). You will be referred to this command in subsequent labs as “Save the file and exit the editor”.

If you want to exit the editor without saving, you need to enter q! at the command prompt (:).

__6. Switch back to the edit (insert) mode by pressing i and type in some more text.

__7. Switch to command mode by pressing Esc followed by :

__8. Enter wq at command prompt.

__9. Press Enter to submit the command.

You should exit the vi editor window.

Note: Opening your file in vi in read-only mode, supply the -R flag, e.g vi -R your_file.dat

__10. Enter the following command:

rm -f foobar

This command will remove the foobar file from file system without prompting for confirmation of your operation (the -f flag).

Part 5 – Sundry Commands and Techniques

The big thing to remember is that Unix / Linux commands, file and folder names are case-sensitive, such that directories foo and Foo are different and can peacefully coexist in the same parent folder.

To view the contents of a file use the cat command, e.g. cat myfile.dat

Before you do this, first check the size of the file using this command: ls -lh which can take the name of the file as an argument or issued without any arguments, in which case it will list details of all files and directories in the current directory. If the file is big, or you would just like browse through it, use the less command that takes the name of the file as an argument, e.g. less your_filename

The less command help you navigate through the file back and forth by using the arrow and PgUp and PgDown keys; when you are done and wish to exist the browsing window, just enter q.

The Linux terminal window supports the folder and file name auto-completion feature which allows you to type the initial part of the directory or file name and complete it by pressing the Tab key.

Use the following short-cuts to quickly navigate and edit the command line:

Ctrl-a Moves the cursor to the start of the line.
Ctrl-e Moves the cursor to the end of the line.
Ctrl-k Deletes (kills) the line contents to the right of the cursor.
Ctrl-u Deletes the line contents to the left of cursor.

Lab 2 – Getting Started with Apache Ambari

Apache Ambari is an Admin Web UI for provisioning, managing and monitoring small and large Apache Hadoop clusters.

Ambari is built on top of a collection of tools and APIs dramatically simplifying Administration of Hadoop. The Hortonworks Data Platform (HDP) uses the Ambari service as its primary Admin Web UI.

In this lab, you will learn how to use Apache Ambari.

For this lab, the Chrome browser is recommended for better user experience.

Part 1 – Log in to Ambari

__1. Open your Chrome browser and navigate to the Ambari login page at http://<YOUR LAB SERVER IP>:8080/

__2. Log in as admin / admin

You should be able to log in to the Ambari Dashboard page. Familiarize yourself with the layout of the main page. The services installed on the Lab Server are listed in the left-hand navigation bar. Some of the services may be stopped, others are up running. By clicking a service link in the navigation bar, you can get to that service’s page.

Note: You can always get back to the main page by clicking the Dashboard link in the menu bar.

The center of the Dashboard page presents you with a list of various system run-time metrics and useful links, such as HDFS links, and much more.

Let’s use Ambari to stop some of the services in order to conserve system resources on our Lab Server.

__3. Click the Oozie service link.

You should be taken to the Oozie service page.

__4. Click the Service Actions button in the top right-hand corner and select Stop.

__5. In the Confirmation dialog, click Confirm Stop.

You will be presented with the operation progress dialog.

__6. Click OK (you may click it before the stopping operation completes).

__7. Repeat the service stopping operation for Flume, Ranger and Zeppelin Notebook

At the end of your operation you would have the following visual status of your services:

Part 2 – Get Technology Stack and Versions Info

You often need to know the versions of particular pieces of software installed on the server. Ambari makes this task quite easy.

__1. Click the Admin menu in the top right-hand corner of the Dashboard and select Stack and Versions.

You should get the Stack tab of the page with a list of installed software along with their version with short description of the service as a nice bonus. The version of the platform can be viewed by clicking the Versions tab.

__2. Click the Versions tab.

You should see the current version of the HDP platform.

Part 3 – Work with Files

You can upload and download (where your permissions allow you the relevant operation) files from your local file system to / from your remote Lab server.

In this lab part, we will upload a text file on the local file system to the /tmp folder on the Lab Server.

__1. On your student computer from which you accessed the Lab Server, create a text file foobar.txt using your text editor (e.g. Notepad) and place it on the desktop.

__2. Click the “checkers block” icon in the toolbar on the left of the admin menu.

__3. Select Local Files.

You should be presented with the Web view of the Lab Server’s file system. Mind you that not all directories are accessible through Ambari UI. We will use the /tmp folder which is accessible by all users.

__4. Click the tmp folder.

__5. In the top right-hand corner, click Upload

__6. Click the Browse … button in the dialog that pops up and in the Open file dialog that opens on your local file system, select the foobar.txt file on your Desktop.

The upload dialog in Ambari should get updated to list foobar.txt file ready for upload.

__7. Click the Upload button to the right of Clear.

__8. In the Search File Names dialog, start typing foobar.txt

The UI will narrow down the suitable options to locate the file you are searching for. There are a couple of things worth noticing. The file is created under the root account. You can download the file, move it, or delete (operations represented by icons on the right of the file info line).

Let’s just delete the file.

__9. Click the “garbage bin” button in the file info line.

__10. In the in-line file deletion operation confirmation dialog that opens, check the

Delete forever and click the OK check icon next to it.

Currently, Ambari UI does not reflect the delete operation by failing to refresh the web page. But the file is deleted all right.

__11. Close the Search File dialog by clicking the X.

You should see the full content of the /tmp directory.

__12. Click the Dashboard in the menu bar.

__13. Sign out of Ambari by selecting admin | Sign out option

This is the last step in this lab.

Part 4 – Review

In this lab, you learned some of the features of Apache Ambari.

Lab 3 – The Hadoop Distributed File System

The Hadoop Distributed File System (HDFS) is the main component of Hadoop which offers developers both Application Programming Interface (API) and the command-line interface (CLI) to interact with it.

In this lab, we will learn how to interact with HDFS using its CLI commands and how to access HDFS using a web browser.

Part 1 – Connect to the Lab Environment

__1. Using your SSH client, connect to the Lab Server. Use root/$123_doop credentials when prompted if you are not already connected.

__2. Once connected to the server, switch user from root to hdfs:

su hdfs

__3. Enter the following command to confirm that you are, indeed, the hdfs user now:

whoami

Part 2 – The Lab Working Directory

All the steps in this lab will be performed in the /home/hdfs/Works/ directory.

__1. In the terminal window, type in the following command:

cd ~/Works

Note: the ‘~’ alias refers to the home directory of the current user, which is /home/hdfs/ in our case.

If you see this message:

bash: cd: /home/hdfs/Works: No such file or directory

Create the folder:

mkdir ~/Works

Repeat the command:

cd ~/Works

__2. Enter the following command:

pwd

You should see the current working directory:

/home/hdfs/Works

Note: The above home directory of the hdfs user is located on the local file system; HDFS hosts its own file system and the home directory of the hdfs user (which has already been created on HDFS for you) is /user/hdfs/. Note that the /user/hdfs/ directory does not exist on the local file system.

Part 3 – Getting Started with HDFS

All interactions using the HDFS CLI in this lab will be done via Hadoop’s File System (FS) shell that is invoked by this command: hadoop fs

Note 1: You can also interface with HDFS using the hdfs admin tool using this command: hdfs dfs

Note 2: The old (and now deprecated) reference to the FS shell is hadoop dfs

Most of the commands in the FS shell have syntax and functionality similar to those of corresponding Unix File System commands.

__1. Enter the following command:

hadoop fs

You should see a list of supported commands that you can submit through the HDFS CLI (the list below is abridged for space.)

[-appendToFile <localsrc> … <dst>]
[-cat [-ignoreCrc] <src> …]
[-checksum <src> …]
[-chgrp [-R] GROUP PATH…]
[-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…]
[-chown [-R] [OWNER][:[GROUP]] PATH…]
[-copyFromLocal [-f] [-p] [-l] <localsrc> … <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> … <localdst>]
[-cp [-f] [-p | -p[topax]] <src> … <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> …]]
[-du [-s] [-h] <path> …]
[-expunge]
[-find <path> … <expression> …]
[-get [-p] [-ignoreCrc] [-crc] <src> … <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-help [cmd …]]
[-ls [-d] [-h] [-R] [<path> …]]
[-mkdir [-p] <path> …]
[-moveFromLocal <localsrc> … <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> … <dst>]
[-put [-f] [-p] [-l] <localsrc> … <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> …]
[-rmdir [–ignore-fail-on-non-empty] <dir> …]
[-stat [format] <path> …]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> …]
[-touchz <path> …]
[-truncate [-w] <length> <path> …]
[-usage [cmd …]]

Generic options supported are
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to
the map reduce cluster

The critical point to notice here is that all commands must be prefixed with the dash ‘-‘, e.g. to get a directory listing, the following command needs to be issued hadoop fs -ls

To get help on a specific command, type the name of that command after -help switch.

For example, to get help on the rm (remove file or directory) command, type this command:

hadoop fs -help rm

__2. Enter the following command:

hadoop fs -ls

You should get no output on console. By default, the -ls HDFS command prints the listing of your home directory in HDFS and currently it is empty as you have not yet uploaded any files there.

Note: The above command is functionally equivalent to this one: hadoop fs -ls /user/hdfs

Let’s see how HDFS is structured on your system.

__3. Enter the following command:

hadoop fs -ls /

You should see the following output of the top folder of HDFS (the timestamps and folder listing may differ in your case).

Found 9 items
drwxrwxrwx – yarn hadoop 0 <TIME STAMP> /app-logs
drwxr-xr-x – hdfs hdfs 0 <TIME STAMP> /apps
drwxr-xr-x – hdfs hdfs 0 <TIME STAMP> /demo
drwxr-xr-x – hdfs hdfs 0 <TIME STAMP> /hdp
drwxr-xr-x – mapred hdfs 0 <TIME STAMP> /mapred
drwxrwxrwx – mapred hadoop 0 <TIME STAMP> /mr-history
drwxr-xr-x – hdfs hdfs 0 <TIME STAMP> /ranger
drwxrwxrwx – hdfs hdfs 0 <TIME STAMP> /tmp
drwxr-xr-x – hdfs hdfs 0 <TIME STAMP> /user

Note 1: HDFS closely follows the Unix File System listing layout.

Note 2: HDFS folders do not exist on the local file system even though names may be the same, e.g. /tmp.

User-specific files and directories are stored in their account-specific directories under the /user directory.

__4. Enter the following command:

hadoop fs -ls /user

You should see the following output (some details in your output may differ):

Found 11 items
drwxrwx— – ambari-qa hdfs 0 <TIME STAMP> /user/ambari-qa
drwxr-xr-x – guest guest 0 <TIME STAMP> /user/guest
drwxr-xr-x – hcat hdfs 0 <TIME STAMP> /user/hcat
drwx—— – hdfs hdfs 0 <TIME STAMP> /user/hdfs
drwx—— – hive hdfs 0 <TIME STAMP> /user/hive
drwxrwxrwx – hue hdfs 0 <TIME STAMP> /user/hue
drwxrwxr-x – oozie hdfs 0 <TIME STAMP> /user/oozie
drwxr-xr-x – solr hdfs 0 <TIME STAMP> /user/solr
drwxrwxr-x – spark hdfs 0 <TIME STAMP> /user/spark
drwxr-xr-x – unit hdfs 0 <TIME STAMP> /user/unit
drwxr-xr-x – zeppelin zeppelin 0 <TIME STAMP> /user/zeppelin

You are a legitimate tenant at the /user/hdfs location on HDFS, which is your home directory.

You may also change to directories of other users (e.g. zeppelin) as long as those directories have the x flag set for other users (the last character in the directory permission list), e.g. drwxr-xr-x. Also, most users on our system belongs to the same group: hdfs, which makes collaboration much easier.

Part 4 – Putting Files on HDFS

You put files on HDFS by using either of these commands: copyFromLocal or put; we will be using put.

Note: You can get help on the put command by issuing this command:

hadoop fs -help put

An abridged output of the above command is shown below (make a note of the -f flag):

-put [-f] [-p] [-l] <localsrc> … <dst> :

Copy files from the local file system into fs.

Copying fails if the file already exists, unless the -f flag is given.

Flags:
-p Preserves access and modification times, ownership and the mode.
-f Overwrites the destination if it already exists.

Let’s create a file that we are going to copy from the local file system over to HDFS.

__1. Enter the following command:

ls -l /usr/bin > usr_bin_dir.dat

This command will capture the listing of the /usr/bin folder in the usr_bin_dir.dat file in our working directory.

__2. Enter the following command:

ls -lh

Our file should be listed in the output; its size is a bit small for Hadoop, though, but it is OK for our educational purposes.

__3. Enter the following command:

hadoop fs -put usr_bin_dir.dat

If the command returns nothing — the file was uploaded successfully (no news is good news!)

This is the simplest possible command that copies the usr_bin_dir.dat file from the local file system to HDFS where it would be assigned the same usr_bin_dir.dat name. Note, that the file on the local system stays untouched.

__4. Enter the following command:

hadoop fs -ls

You should see that your file has been successfully copied over to HDFS.

Found 1 items

-rw-r–r– 3 hdfs hdfs 65202 <TIME STAMP> usr_bin_dir.dat

The number 3 in front of the hdfs account name is the number of replicated copies of your file. In our lab, we are running in pseudo-distributed deployment mode which, however, maintains the usual (default) 3 replica of every file block on HDFS.

Note: HDFS protects you from accidentally overwriting existing file with the same name and location, so if you run the above command again: hadoop fs -put usr_bin_dir.dat, you would be presented with this message: put: `usr_bin_dir.dat’: File exists OK. Now, what if you want to have a different (e.g. a shorter or a fancier) name of the target file in HDFS. For this, you need to specify this name in the put command.

__5. Enter the following command:

hadoop fs -put usr_bin_dir.dat files

Now, if you issue the hadoop fs -ls command, you will see the following output:

Found 2 items
-rw-r–r– 3 hdfs hdfs 65202 <TIME STAMP> files
-rw-r–r– 3 hdfs hdfs 65202 <TIME STAMP> usr_bin_dir.dat

The usr_bin_dir.dat file was persisted on HDFS as files.

Now let’s see how to get file(s) back from HDFS to the local file system. This should be a two-way street, right?

Part 5 – Getting Files from HDFS

You can get file(s) from HDFS by using either command: copyToLocal or get which are functionally equivalent. We will use the shorter version, get.

__1. Enter the following command:

hadoop fs -get files

You should get the files HDFS file (which is the replica of the usr_bin_dir.dat file).

__2. Enter the following command:

diff files usr_bin_dir.dat

You should get no differences between the files (no output on your console).

Note:

You can also get locally a whole HDFS directory or all files in a particular directory that match a specific name pattern. For example, hadoop fs -get /user/hdfs/* will fetch all the files from your home HDFS directory (here we use the fully qualified directory name: /user/hdfs/ and the wild card ‘*’ for all the files located there).

To get a whole HDFS directory, you need to run the -get command referencing the source directory you want to copy locally with the terminating slash: ‘/’, e.g. hadoop fs -get /user/hdfs/

Part 6 – Creating Folders

It is always a good idea to have a way to partition your file system into separate folders. The HDFS design is no exception to this rule with full support for this file organizational task.

__1. Enter the following command:

hadoop fs -mkdir REPORT

This command will create the REPORT folder under /user/hdfs/ (the /user/hdfs/REPORT/ directory).

Now, let’s see how you can put multiple files from the local File System to HDFS.

__2. Enter the following command:

hadoop fs -put * files REPORT

This command will copy both files and usr_bin_dir.dat files to the /user/hdfs/REPORT directory. You can verify the creation of the files using either command:

hadoop fs -ls /user/hdfs/REPORT
or
hadoop fs -ls REPORT

Part 7 – Moving Files Around

You move files around the HDFS system using the mv command. When you are moving multiple files, the destination argument to the mv command must be a directory.

Note: To get help on this command, run: hadoop fs -help mv

__1. Enter the following command:

hadoop fs -mv REPORT/files files.moved

This command will move the files file from the REPORT directory back to the default hdfs’s directory with the files.moved name. Now if you run the hadoop fs -ls REPORT command, you will see that the files file was, indeed, removed from the source location.

__2. Enter the following command to see the results of your activities on HDFS so far:

hadoop fs -ls -R

You should see the following output:
drwxr-xr-x – hdfs hdfs 0 <TIME STAMP> /user/hdfs/REPORT
-rw-r–r– 3 hdfs hdfs 65202 <TIME STAMP> /user/hdfs/REPORT/usr_bin_dir.dat
-rw-r–r– 3 hdfs hdfs 65202 <TIME STAMP> /user/hdfs/files
-rw-r–r– 3 hdfs hdfs 65202 <TIME STAMP> /user/hdfs/files.moved
-rw-r–r– 3 hdfs hdfs 65202 <TIME STAMP> /user/hdfs/usr_bin_dir.dat

Part 8 – Deleting (Removing) Files

You delete files / directories with the rm command. The syntax is quite simple:
hadoop fs -rm <file to delete>

You should be aware of one caveat when using the rm command: depending on your HDFS configuration, your file could actually be retained and put in the <user home dir>/.Trash folder that would be conveniently created by HDFS if it was not created before. Having the file around stashed in the Trash bin may be a good idea in case you may still need it in future. On the other hand, it may be a bad thing in case you know you will never need this file again as having the file (particularly a really big one) around consumes resources on the Name Node and disk space on the Data Node(s).

Let’s delete a file and see what happens.

__1. Enter the following command:

hadoop fs -rm files.moved

You will get a confirmation message to this effect:
<TIME STAMP> INFO fs.TrashPolicyDefault: Namenode trash configuration:
Deletion interval = 360 minutes, Emptier interval = 0 minutes.
Moved: ‘hdfs://sandbox.hortonworks.com:8020/user/hdfs/files.moved’ to trash at: hdfs://sandbox.hortonworks.com:8020/user/hdfs/.Trash/Current

In order to prevent file retention in the .Trash folder (provided the “trashing” feature is enabled), use the -skipTrash flag to the rm command which will permanently remove the file from HDFS. Here is how you run this command.

__2. Enter the following command:

hadoop fs -rm -skipTrash usr_bin_dir.dat

This command will completely remove the usr_bin_dir.dat file from HDFS whether or not the trashing feature is enabled.

Note: The HDFS’s Trash facility is designed with enterprise use cases in mind. You can specify the minimum period (in minutes) that a deleted file will remain in the .Trash folder using the fs.trash.interval configuration property in core-site.xml. By default,  fs.trash.interval is zero, which disables trash. The Trash facility is available through the HDFS FS shell; if files are deleted programmatically (using the HDFS FS API) they are removed immediately. The HDFS FS shell also lets you expunge deleted files from the .Trash folder that have been there longer than the prescribed retention period; the command for this is: $ hadoop fs -expunge

Part 9 – Navigating HDFS with the Web Browser

__1. Start your browser and navigate to:

http://<IP Address of Your Lab Server>:50070

__2. Click Utilities → Browse the file system link on the Name Node page.

__3. Navigate to /user/hdfs

You should get an access permission-related error.
The problem is that /user/hdfs directory is not configured to be viewable by the browser user (user=dr.who).

Let’s fix this problem.

__4. Switch to your terminal window.

__5. Enter the following command:

hdfs dfs -chmod go+rx /user/hdfs/

The above command will change the folder permissions to drwxr-xr-x for the user group and others (making it viewable by everyone).

__6. Switch back to the browser window and repeat the change directory command to

/user/hdfs/

Now you should be allowed to change directory see the contents of your home directory on HDFS.

Take some time to familiarize yourself with the HDFS file system as reported by Name Node web UI.

Note: You still will not be able to change to the .Trash folder due to the restrictive permissions set on this directory.

__7. When you are done, close the browser.

Part 10 – Working Area Clean-up

We are not going to use any of the artifacts we created in this lab in future labs, so it is a good idea to delete those unneeded resources.

__1. Switch back to the terminal window.

__2. Enter the following command:

pwd

You should see that you are in the /home/hdfs/Works directory.

If you are not, change to the /home/hdfs/Works directory.

__3. Enter the following command to delete all files in the working directory:

rm -f *

__4. Enter the following command:

ls

There should be no files left in the working folder.

Now the same work needs to be done on HDFS.

__5. Enter the following command (don’t forget /* at the end of the command!):

hadoop fs -rm -r -skipTrash /user/hdfs/*

You should see a confirmation message on the deleted resources.

__6. Enter the following command:

hadoop fs -ls /user/hdfs

The command should return no results.

Part 11 – Ending the Working Session

__1. Execute the exit command twice in the terminal window to close it.
This is the last step in this lab.

Part 12 – Review

In this lab, we learned how to work with the Hadoop Distributed File System (HDFS) using its command-line interface represented by the HDFS File System shell. We also looked at how to navigate through HDFS using your browser.