Hive Programming Training

This is a beginner to advanced level training course on Hive. This intensive training course encompasses lectures and hands-on labs that help students learn theoretical knowledge and gain practical experience of Hive projects.

Request On-Site or Customized Course Info

Course Details

Duration

2 days

Prerequisites

General knowledge of programming and SQL
Experience working in Unix environments (e.g. running shell commands, etc.). Participants should be familiar with HDFS

Target Audience

Developers
Architects
Team Leads
Data Analysts
Data Scientists

Course Outline

Apache Hive
- Traditional RDBMS Capabilities and TCO
- What is Hive?
- Apache Hive Logo
- Hive's Value Proposition
- Who uses Hive?
- What Hive Does Not Have
- Hive's Main Sub-Systems
- Hive Features
- The "Classic" Hive Architecture
- The New Hive Architecture (Hive Server 2)
- Multi-Client Concurrency in Hive Server 2
- Components
- Where are the Hive Tables Located?
- Data Organization in Hive
- Hive Tables
- Managed and External Tables
- Partitions
- Buckets
- Buckets and Partitions
- Buckets Visually
- Partitions Visually
- HiveQL
- The "Classic" Hive Command-line Interface (CLI)
- The Beeline Command Shell
Hive Command-line Interface
- Hive Command-line Interface (CLI)
- The Hive Interactive Shell
- Running Host OS Commands from the Hive Shell
- Interfacing with HDFS from the Hive Shell
- The Hive in Unattended Mode
- The Hive CLI Integration with the OS Shell
- Executing HiveQL Scripts
- Comments in Hive Scripts
- Variables and Properties in Hive CLI
- Setting Properties in CLI
- Passing Arguments to Hive Script
- Hive Namespaces
- Using the SET Command
- Setting Properties in the Shell
- Setting Properties for the New Shell Session
- Setting Alternative Hive Execution Engines
- The Beeline Shell
- Connecting to the Hive Server in Beeline
- Beeline Command Switches
- Beeline Internal Commands
Hive Data Definition Language
- Hive Data Definition Language
- Creating Databases in Hive
- Using Databases
- Creating Tables in Hive
- Supported Data Type Categories
- Common Primitive Types
- String and Date / Time Types
- Complex Types
- Miscellaneous Types
- Example of CREATE TABLE Statement
- Working with Complex Types
- Table Partitioning
- Partitions Benefits
- Table Partitioning on Multiple Columns
- Viewing Table Partitions
- Bucketed Table DDL
- Loading Data into Bucketed Table
- File Format Storage
- ORC, Parquet, and Avro Binary Data Formats Compared
- Data Serializers / Deserializers
- Row Format
- Visualizing Row Format
- Row Format with the SerDe Definition
- A RegexSerDe Example
- The ORC Data Format
- Converting Text to ORC Data Format
- The Parquet Data Storage Format
- File Compression
- The EXTERNAL DDL Parameter
- Features Comparison
- What Type is my Table?
- Temporary Tables
- Creating an Empty Table
- Dropping a Table
- Table / Partition(s) Truncation
- Alter Table/Partition/Column
- Views
- Create View Statement
- Why Use Views?
- Restricting Amount of Viewable Data
- Examples of Restricting Amount of Viewable Data
- Hive Indexing
- Describing Data
HiveQL
- What is HiveQL?
- HiveQL Main Features
- Alternative Execution Engines
- Data Validation
- Hive Data Manipulation Language (DML)
- Using the LOAD DATA statement
- Loading Data with the INSERT Statement
- Appending and Replacing Data with the INSERT Statement
- Multi-Table Inserts
- Multi-table Inserts Syntax
- Multi-Table Inserts Example
- INSERT … DIRECTORY
- The Skewed Tables Concept
- A Skewed Tables Example
- Controlling the Number of Reducers
- Computing Table Statistics
- ANALYZE TABLE Command
- DESCRIBE Command Variants
Hive Select Statement and Built-In Functions
- The SELECT Statement Syntax
- The WHERE Clause
- Examples of the WHERE Statement
- Partition-Based Queries
- Create Table As Select Operation
- Supported Numeric Operators
- Built-in Mathematical Functions
- Built-in Aggregate Functions
- Built-in Statistical Functions
- Other Useful Built-in Functions
- The GROUP BY Clause
- The HAVING Clause
- The LIMIT Clause
- The ORDER BY Clause
- The JOIN Clause
- Types of Joins
- The Shuffle Join Visually
- Map (Broadcast) Join Visually
- Setting Up the Map Side (Broadcast) Join
- Sort-Merge-Bucket Join Visually
- The CASE … Clause
- Re-Writing SELECT Statements
- The TRANSFORM Clause
- Performance Enhancements with Vectorization + ORC
Apache HUE
- What is Apache HUE?
- HUE Login Page
- HUE Web UI at a Glance
- Supported Editors and Dashboards
- Hive / Impala Query Editor
- Command Auto-completion and Metastore Look-Ups
- Parameterizing Queries
- Hue Configuration
Lab Exercises
- Lab 1. Learning the Lab Environment
- Lab 2. The Hadoop Distributed File System
- Lab 3. The Hive and Beeline Shells
- Lab 4. Understanding Tables in Hive
- Lab 5. Querying Hive Tables
- Lab 6. Extending Hive with UDFs
- Lab 7. Partitioned and Skewed Tables in Hive
- Lab 8. Working with the Parquet Data Format in Hive
- Lab 9. Working with the Avro Data Format in Hive
- Lab 10. Working with Regular Expressions in Hive
- Lab 11. Working with Indexes in Hive (Optional)

Upcoming Course Dates

USD $1,570

Online Virtual Class

Scheduled

Date: Aug 5 - 6, 2024

Time: 10 AM - 6 PM ET

USD $1,570

Online Virtual Class

Scheduled

Date: Aug 26 - 27, 2024

Time: 10 AM - 6 PM ET

USD $1,570

Online Virtual Class

Scheduled

Date: Sep 2 - 3, 2024

Time: 10 AM - 6 PM ET

USD $1,570

Online Virtual Class

Scheduled

Date: Sep 30 - Oct 1, 2024

Time: 10 AM - 6 PM ET

USD $1,570

Online Virtual Class

Scheduled

Date: Nov 4 - 5, 2024

Time: 10 AM - 6 PM ET

Hive Programming Training

Duration

Prerequisites

Target Audience

Course Catalog

Upskilling and Reskilling

Resources

About Us

Contact