Course #:WA2867

Hive Programming Training

This is a beginner to advanced level training course on Hive.  This intensive training course encompasses lectures and hands-on labs that help students learn theoretical knowledge and gain practical experience of Hive projects.

TOPICS

  • Introduction
  • Architectural View
  • Hive Basics
  • Advanced Hive
  • Extending Hive 

AUDIENCE

Developers/Architects/Team Leads/Data Analysts/Data Scientists 

PREREQUISITES

Participants should have the general knowledge of programming and SQL as well as experience working in Unix environments (e.g. running shell commands, etc.). Participants should be familiar with HDFS. 

DURATION

2 days 

Outline of Hive Programming Training

Chapter 1. Apache Hive

  • Traditional RDBMS Capabilities and TCO
  • What is Hive?
  • Apache Hive Logo
  • Hive's Value Proposition
  • Who uses Hive?
  • What Hive Does Not Have
  • Hive's Main Sub-Systems
  • Hive Features
  • The "Classic" Hive Architecture
  • The New Hive Architecture (Hive Server 2)
  • Multi-Client Concurrency in Hive Server 2
  • Components
  • Where are the Hive Tables Located?
  • Data Organization in Hive
  • Hive Tables
  • Managed and External Tables
  • Partitions
  • Buckets
  • Buckets and Partitions
  • Buckets Visually
  • Partitions Visually
  • HiveQL
  • The "Classic" Hive Command-line Interface (CLI)
  • The Beeline Command Shell
  • Summary

Chapter 2. Hive Command-line Interface

  • Hive Command-line Interface (CLI)
  • The Hive Interactive Shell
  • Running Host OS Commands from the Hive Shell
  • Interfacing with HDFS from the Hive Shell
  • The Hive in Unattended Mode
  • The Hive CLI Integration with the OS Shell
  • Executing HiveQL Scripts
  • Comments in Hive Scripts
  • Variables and Properties in Hive CLI
  • Setting Properties in CLI
  • Example of Setting Properties in CLI
  • Passing Arguments to Hive Script
  • Hive Namespaces
  • Using the SET Command
  • Setting Properties in the Shell
  • Setting Properties for the New Shell Session
  • Setting Alternative Hive Execution Engines
  • The Beeline Shell
  • Connecting to the Hive Server in Beeline
  • Beeline Command Switches
  • Beeline Internal Commands
  • Summary

Chapter 3. Hive Data Definition Language

  • Hive Data Definition Language
  • Creating Databases in Hive
  • Using Databases
  • Creating Tables in Hive
  • Supported Data Type Categories
  • Common Primitive Types
  • String and Date / Time Types
  • Complex Types
  • Miscellaneous Types
  • Example of CREATE TABLE Statement
  • Working with Complex Types
  • Table Partitioning
  • Table Partitioning
  • Partitions Benefits
  • Table Partitioning on Multiple Columns
  • Viewing Table Partitions
  • Bucketed Table DDL
  • Loading Data into Bucketed Table
  • File Format Storage
  • ORC, Parquet, and Avro Binary Data Formats Compared
  • Data Serializers / Deserializers
  • Row Format
  • Visualizing Row Format
  • Row Format with the SerDe Definition
  • A RegexSerDe Example
  • The ORC Data Format
  • Converting Text to ORC Data Format
  • The Parquet Data Storage Format
  • File Compression
  • The EXTERNAL DDL Parameter
  • Example of Using EXTERNAL
  • Features Comparison
  • What Type is my Table?
  • Temporary Tables
  • Creating an Empty Table
  • Dropping a Table
  • Table / Partition(s) Truncation
  • Alter Table/Partition/Column
  • Views
  • Create View Statement
  • Why Use Views?
  • Restricting Amount of Viewable Data
  • Examples of Restricting Amount of Viewable Data
  • Hive Indexing
  • Describing Data
  • Summary

Chapter 4. HiveQL

  • What is HiveQL?
  • HiveQL Main Features
  • Alternative Execution Engines
  • Data Validation
  • Hive Data Manipulation Language (DML)
  • Using the LOAD DATA statement
  • Examples of Loading Data into a Hive Table
  • Loading Data with the INSERT Statement
  • Appending and Replacing Data with the INSERT Statement
  • Examples of Using the INSERT Statement
  • Multi-Table Inserts
  • Multi-table Inserts Syntax
  • Multi-Table Inserts Example
  • INSERT … DIRECTORY
  • The Skewed Tables Concept
  • A Skewed Tables Example
  • Controlling the Number of Reducers
  • Computing Table Statistics
  • ANALYZE TABLE Command
  • DESCRIBE Command Variants
  • Summary

Chapter 5. Hive Select Statement and Built-In Functions

  • The SELECT Statement Syntax
  • The WHERE Clause
  • Examples of the WHERE Statement
  • Partition-Based Queries
  • Example of an Efficient Use Of Partitions in SELECT Statement
  • Create Table As Select Operation
  • Supported Numeric Operators
  • Built-in Mathematical Functions
  • Built-in Aggregate Functions
  • Built-in Statistical Functions
  • Other Useful Built-in Functions
  • The GROUP BY Clause
  • The HAVING Clause
  • The LIMIT Clause
  • The ORDER BY Clause
  • The JOIN Clause
  • Types of Joins
  • The Shuffle Join Visually
  • Map (Broadcast) Join Visually
  • Setting Up the Map Side (Broadcast) Join
  • Sort-Merge-Bucket Join Visually
  • The CASE … Clause
  • Example of CASE … Clause
  • Re-Writing SELECT Statements
  • The TRANSFORM Clause
  • Performance Enhancements with Vectorization + ORC
  • Summary

Chapter 6. Apache HUE

  • What is Apache HUE?
  • HUE Login Page
  • HUE Web UI at a Glance
  • Supported Editors and Dashboards
  • Hive / Impala Query Editor
  • Command Auto-completion and Metastore Look-Ups
  • Parameterizing Queries
  • Hue Configuration
  • Summary

Lab Exercises

Lab 1. Learning the Lab Environment
Lab 2. The Hadoop Distributed File System
Lab 3. The Hive and Beeline Shells
Lab 4. Understanding Tables in Hive
Lab 5. Querying Hive Tables
Lab 6. Extending Hive with UDFs
Lab 7. Partitioned and Skewed Tables in Hive
Lab 8. Working with the Parquet Data Format in Hive
Lab 9. Working with the Avro Data Format in Hive
Lab 10. Working with Regular Expressions in Hive
Lab 11. Working with Indexes in Hive (Optional)

We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC.