Java Development for Apache Hadoop Training

Course #:WA2422

Java Development for Apache Hadoop Training (Coming Soon)

Apache Hadoop is gaining popularity with many organizations seeking ways to cost-efficiently derive business insights from the massive amounts of data they generate in running their businesses.  Hadoop is written in Java and Java is the primary programming language used to fully unleash the power of the Hadoop analytics platform based on the MapReduce programming model.


This intensive training course covers both theoretical and technical aspects of designing and developing applications for the Apache Hadoop platform using Java.

The course is supplemented by an assortment of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.


  • The Hadoop Big Data platform
  • The Hadoop Distributed File System (HDFS)
  • The MapReduce programming model
  • Pig
  • Hive
  • HBase


Java Developers, Data Scientists, Big Data Technical Leads


No prior Hadoop knowledge is required. Attendees must have working experience with the Java programming language.


4 Days

Outline of Java Development for Apache Hadoop Training

1. MapReduce Overview

  • MapReduce Shared-Nothing Architecture
  • Similarity with SQL Aggregation Operations
  • Problems Suitable for Solving with MapReduce
  • Fault-tolerance of MapReduce processes
  • Distributed Computing Economics
  • Amazon Elastic MapReduce 

2. Hadoop Platform Overview

  • Typical Hadoop Applications
  • High-Level Hadoop Architecture
  • Hadoop's Core Components
  • HDFS architecture
  • Accessing HDFS
  • MapReduce on Hadoop
  • The "Classic MapReduce" (MRv1) vs YARN
  • Overview of Hadoop-based Systems for Data Analysis

3. Hadoop Java Development

  • MapReduce Programming Options
  • Java MapReduce API
  • The Structure of a Java MapReduce Program
  • The Mapper Class
  • Combiner (Optional)
  • The Reducer Class
  • The Driver Class
  • Compiling Classes
  • Running MapReduce Jobs
  • Hadoop's Streaming MapReduce
  • Streaming MapReduce Use Cases
  • The Streaming API vs Java MapReduce API
  • Built-in input and output formats
  • Writing a custom InputFormat class
  • Writing a custom OutputFormat class
  • Using the Distributed Cache
  • Unit testing Java MapReduce jobs with MRUnit

4. Performance Tuning

  • MapReduce job optimization
  • Using partitioners and comparators
  • Collecting job run-time statistics with counters

5. Java Development for Hadoop-centric Analytics Systems

  • Creating User-Defined Functions for Pig
  • Creating User-Defined Functions for Hive
  • Interfacing with HBase
  • HBase Scanners
  • Using ResultScanner Efficiently
We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC.