Course #:TP2749

Advanced Hive for HDP Developers Training

This is a beginner to advanced level training course on Hive. The course will use Hive bundled with Hortonworks Data Platform (HDP 2.6) version.

Objectives

This intensive training course encompasses lectures and hands-on labs that help students learn theoretical knowledge and gain practical experience of Hive project.

Topics

•    IntroductionObjectives
•    Architectural View
•    Hive Basics
•    Advanced Hive
•    Extending Hive

Audience

Developers/Architects/Team Leads/Data Analysts/Data Scientists

Prerequisites

Participants should have the general knowledge of programming and SQL as well as experience working in Unix environments (e.g. running shell commands, etc.). Participants should be familiar with HDFS.

Duration

2 Days

Outline of Advanced Hive for HDP Developers Training

CHAPTER 1. INTRODUCTION TO HIVE

•    What is Hive?
•    Why Hive?
•    Hive in Big Data Analytics Projects
•    Hive vs Other Key tools
•    Hive vs RDBMS
•    Hive vs HBase
•    Hive Usecases
•    Summary

CHAPTER 2. ARCHITECTURAL VIEW

•    Logical Architecture of Hive
•    MapReduce Concepts
•    MapReduce Architecture
•    YARN Introduction
•    Spark and Hive Integration
•    Hive Old Architecture
•    Hive Server2 Architecture
•    Summary

CHAPTER 3. HIVE BASICS

•    Hive Syntax
•    Hive Datatypes
•    Beeline shell
•    Creating Databases and Tables
•    Populating Tables
•    Views
•    HCatalog
•    Summary

CHAPTER 4. ADVANCED HIVE

•    Working with Complex Datatypes
•    Data Formats
•    Hive with various Data Formats
•    Working with compressed data
•    Converting data from one format to another
•    Role of Regular Expressions in Hive
•    Partitioning
•    Joining
•    Bucketing
•    Indexing
•    De-Duplication
•    Summary

CHAPTER 5. HIVE CUSTOM EXTENSIONS

•    Open Source Options for extending Hive
•    User Defined Functions
•    Various types of UDFs
•    SerDes
•    Summary

LAB EXERCISES

•    Lab 1. Basic Hive Exercises
•    Lab 2. Table Partitioning in Hive
•    Lab 3. Dealing with Semi-structured data in Hive
•    Lab 4. Working with Copybook format (Mainframe data) in Hive
•    Lab 5. Processing DataFormats through Hive
•    Lab 6. Complex Datatypes in Hive
•    Lab 7. UDFs and UDAFs
•    Lab 8. Bucketing and many others (depending upon available time)





We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC.