Training

 

 

Popular Courses

Browse Our Free Resources

  • whitepapers
  • whitepapers
  • webinars
  • blogs

Our Locations

Training Centres

Vancouver, BC
Calgary, AB
Edmonton, AB
Toronto, ON
Ottawa, ON
Montreal, QC
Hunt Valley
Columbia

locations map

Calgary

550 6th Av SW
Suite 475
Calgary, AB
T2P 0S2

Toronto

821A Bloor Street West
Toronto, ON
M6G 1M1

Vancouver

409 Granville St
Suite 902
Vancouver, BC
V6C 1T2

U.S. Office

436 York Road
Suite 1
Jenkintown, PA
19046

Other Locations

Dallas, TX
Miami, FL

Home > Training > Hadoop > Java Development for Apache Hadoop Training

Java Development for Apache Hadoop Training (Coming Soon)

Course#: WA2422

Apache Hadoop is gaining popularity with many organizations seeking ways to cost-efficiently derive business insights from the massive amounts of data they generate in running their businesses.  Hadoop is written in Java and Java is the primary programming language used to fully unleash the power of the Hadoop analytics platform based on the MapReduce programming model.

OBJECTIVES

This intensive training course covers both theoretical and technical aspects of designing and developing applications for the Apache Hadoop platform using Java.

The course is supplemented by an assortment of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.

TOPICS

  • The Hadoop Big Data platform
  • The Hadoop Distributed File System (HDFS)
  • The MapReduce programming model
  • Pig
  • Hive
  • HBase

AUDIENCE

Java Developers, Data Scientists, Big Data Technical Leads

PREREQUISITES

No prior Hadoop knowledge is required. Attendees must have working experience with the Java programming language.

DURATION

4 Days

Outline of WA2422 Java Development for Apache Hadoop Training

1. MapReduce Overview

  • MapReduce Shared-Nothing Architecture
  • Similarity with SQL Aggregation Operations
  • Problems Suitable for Solving with MapReduce
  • Fault-tolerance of MapReduce processes
  • Distributed Computing Economics
  • Amazon Elastic MapReduce 

2. Hadoop Platform Overview

  • Typical Hadoop Applications
  • High-Level Hadoop Architecture
  • Hadoop's Core Components
  • HDFS architecture
  • Accessing HDFS
  • MapReduce on Hadoop
  • The "Classic MapReduce" (MRv1) vs YARN
  • Overview of Hadoop-based Systems for Data Analysis

3. Hadoop Java Development

  • MapReduce Programming Options
  • Java MapReduce API
  • The Structure of a Java MapReduce Program
  • The Mapper Class
  • Combiner (Optional)
  • The Reducer Class
  • The Driver Class
  • Compiling Classes
  • Running MapReduce Jobs
  • Hadoop's Streaming MapReduce
  • Streaming MapReduce Use Cases
  • The Streaming API vs Java MapReduce API
  • Built-in input and output formats
  • Writing a custom InputFormat class
  • Writing a custom OutputFormat class
  • Using the Distributed Cache
  • Unit testing Java MapReduce jobs with MRUnit

4. Performance Tuning

  • MapReduce job optimization
  • Using partitioners and comparators
  • Collecting job run-time statistics with counters

5. Java Development for Hadoop-centric Analytics Systems

  • Creating User-Defined Functions for Pig
  • Creating User-Defined Functions for Hive
  • Interfacing with HBase
  • HBase Scanners
  • Using ResultScanner Efficiently
We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC.
*Your name:

*Your e-mail:

*Phone:

*Company name:

Additional notes:

We have received your message. A sales representative will contact you soon.

Thank you!.

more details
buy this course

Register for a courseware sample

It's simple, and free.

 

Thank You!

You will receive an email shortly containing a link to download the requested sample of the labs for this course.