Course #:WA2622 Hadoop Programming on the Hortonworks Data Platform for Managers Training Download Sample Labs 06/07/2021 - 06/08/2021 USD$1,295.00 Instructor Led Virtual 07/26/2021 - 07/27/2021 USD$1,295.00 Instructor Led Virtual This training course introduces the students to Apache Hadoop and key Hadoop ecosystem projects: Pig, Hive, Sqoop, and Spark. This training course is supplemented by a variety of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material and gain practical experience of working with Apache Hadoop and related Apache projects. AUDIENCE Managers, Business Analysts, and IT Architects. PREREQUISITES Participants should have the general knowledge of programming. DURATION 2 Days Outline of Hadoop Programming on the Hortonworks Data Platform for Managers Training Chapter 1. MapReduce Overview The Client – Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce The Reduce Phase of MapReduce MapReduce Explained MapReduce Word Count Job MapReduce Shared-Nothing Architecture Similarity with SQL Aggregation Operations Example of Map & Reduce Operations using JavaScript Problems Suitable for Solving with MapReduce Typical MapReduce Jobs Fault-tolerance of MapReduce Distributed Computing Economics MapReduce Systems Summary Chapter 2. Hadoop Overview Apache Hadoop Apache Hadoop Logo Typical Hadoop Applications Hadoop Clusters Hadoop Design Principles Hadoop Versions Hadoop's Main Components Hadoop Simple Definition Side-by-Side Comparison: Hadoop 1 and Hadoop 2 Hadoop-based Systems for Data Analysis Other Hadoop Ecosystem Projects Hadoop Caveats Hadoop Distributions Cloudera Distribution of Hadoop (CDH) Cloudera Distributions Hortonworks Data Platform (HDP) MapR Summary Chapter 3. Hadoop Distributed File System Overview Hadoop Distributed File System (HDFS) HDFS High Availability HDFS "Fine Print" Storing Raw Data in HDFS Hadoop Security HDFS Rack-awareness Data Blocks Data Block Replication Example HDFS NameNode Directory Diagram Accessing HDFS Examples of HDFS Commands Other Supported File Systems WebHDFS Examples of WebHDFS Calls Client Interactions with HDFS for the Read Operation Read Operation Sequence Diagram Client Interactions with HDFS for the Write Operation Communication inside HDFS Summary Chapter 4. Apache Pig Scripting Platform What is Pig? Pig Latin Apache Pig Logo Pig Execution Modes Local Execution Mode MapReduce Execution Mode Running Pig Running Pig in Batch Mode What is Grunt? Pig Latin Statements Pig Programs Pig Latin Script Example SQL Equivalent Differences between Pig and SQL Statement Processing in Pig Comments in Pig Supported Simple Data Types Supported Complex Data Types Arrays Defining Relation's Schema Not Matching the Defined Schema The bytearray Generic Type Using Field Delimiters Loading Data with TextLoader() Referencing Fields in Relations Summary Chapter 5. Apache Pig HDFS Interface The HDFS Interface FSShell Commands (Short List) Grunt's Old File System Commands Summary Chapter 6. Apache Pig Relational and Eval Operators Pig Relational Operators Example of Using the JOIN Operator Example of Using the Order By Operator Caveats of Using Relational Operators Pig Eval Functions Caveats of Using Eval Functions (Operators) Example of Using Single-column Eval Operations Example of Using Eval Operators For Global Operations Summary Chapter 7. Hive What is Hive? Apache Hive Logo Hive's Value Proposition Who uses Hive? Hive's Main Sub-Systems Hive Features The "Classic" Hive Architecture The New Hive Architecture HiveQL Where are the Hive Tables Located? Hive Command-line Interface (CLI) The Beeline Command Shell Summary Chapter 8. Hive Command-line Interface Hive Command-line Interface (CLI) The Hive Interactive Shell Running Host OS Commands from the Hive Shell Interfacing with HDFS from the Hive Shell The Hive in Unattended Mode The Hive CLI Integration with the OS Shell Executing HiveQL Scripts Comments in Hive Scripts Variables and Properties in Hive CLI Setting Properties in CLI Example of Setting Properties in CLI Hive Namespaces Using the SET Command Setting Properties in the Shell Setting Properties for the New Shell Session Setting Alternative Hive Execution Engines The Beeline Shell Connecting to the Hive Server in Beeline Beeline Command Switches Beeline Internal Commands Summary Chapter 9. Hive Data Definition Language Hive Data Definition Language Creating Databases in Hive Using Databases Creating Tables in Hive Supported Data Type Categories Common Numeric Types String and Date / Time Types Miscellaneous Types Example of the CREATE TABLE Statement Working with Complex Types Table Partitioning Table Partitioning Table Partitioning on Multiple Columns Viewing Table Partitions Row Format Data Serializers / Deserializers File Format Storage File Compression More on File Formats The ORC Data Format Converting Text to ORC Data Format The EXTERNAL DDL Parameter Example of Using EXTERNAL Creating an Empty Table Dropping a Table Table / Partition(s) Truncation Alter Table/Partition/Column Views Create View Statement Why Use Views? Restricting Amount of Viewable Data Examples of Restricting Amount of Viewable Data Creating and Dropping Indexes Describing Data Summary Chapter 10. Hive Data Manipulation Language Hive Data Manipulation Language (DML) Using the LOAD DATA statement Example of Loading Data into a Hive Table Loading Data with the INSERT Statement Appending and Replacing Data with the INSERT Statement Examples of Using the INSERT Statement Multi Table Inserts Multi Table Inserts Syntax Multi Table Inserts Example Summary Chapter 11. Apache Sqoop What is Sqoop? Apache Sqoop Logo Sqoop Import / Export Sqoop Help Examples of Using Sqoop Commands Data Import Example Fine-tuning Data Import Controlling the Number of Import Processes Data Splitting Helping Sqoop Out Example of Executing Sqoop Load in Parallel A Word of Caution: Avoid Complex Free-Form Queries Using Direct Export from Databases Example of Using Direct Export from MySQL More on Direct Mode Import Changing Data Types Example of Default Types Overriding File Formats The Apache Avro Serialization System Binary vs Text More on the SequenceFile Binary Format Generating the Java Table Record Source Code Data Export from HDFS Export Tool Common Arguments Data Export Control Arguments Data Export Example Using a Staging Table INSERT and UPDATE Statements INSERT Operations UPDATE Operations Example of the Update Operation Failed Exports Sqoop2 Sqoop2 Architecture Summary Chapter 12. Introduction to Apache Spark What is Apache Spark A Short History of Spark Where to Get Spark? The Spark Platform Spark Logo Common Spark Use Cases Languages Supported by Spark Running Spark on a Cluster The Driver Process Spark Applications Spark Shell The spark-submit Tool The spark-submit Tool Configuration The Executor and Worker Processes The Spark Application Architecture Interfaces with Data Storage Systems Limitations of Hadoop's MapReduce Spark vs MapReduce Spark as an Alternative to Apache Tez The Resilient Distributed Dataset (RDD) Spark Streaming (Micro-batching) Spark SQL Example of Spark SQL Spark Machine Learning Library GraphX Spark vs R Summary Chapter 13. The Spark Shell The Spark Shell The Spark Shell UI Spark Shell Options Getting Help The Spark Context (sc) and SQL Context (sqlContext) The Shell Spark Context Loading Files Saving Files Basic Spark ETL Operations Summary Lab Exercises Lab 1. Learning the Lab EnvironmentLab 2. Getting Started with Apache AmbariLab 3. The Hadoop Distributed File SystemLab 4. Getting Started with Apache Pig Lab 5. Working with Data Sets in Apache Pig Lab 6. The Hive and Beeline Shells Lab 7. Hive Data Definition Language Lab 8. The Spark Shell We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC. View Course Outline Share This Request On-Site or Customized Course Info Lab Setup Guide REGISTER FOR A COURSEWARE SAMPLE x Sent First Name Last Name Email Request On-Site or Customized Course Info x Sent First Name Last Name Phone Number Company Name Email Question
Course #:WA2622 Hadoop Programming on the Hortonworks Data Platform for Managers Training Download Sample Labs 06/07/2021 - 06/08/2021 USD$1,295.00 Instructor Led Virtual 07/26/2021 - 07/27/2021 USD$1,295.00 Instructor Led Virtual This training course introduces the students to Apache Hadoop and key Hadoop ecosystem projects: Pig, Hive, Sqoop, and Spark. This training course is supplemented by a variety of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material and gain practical experience of working with Apache Hadoop and related Apache projects. AUDIENCE Managers, Business Analysts, and IT Architects. PREREQUISITES Participants should have the general knowledge of programming. DURATION 2 Days Outline of Hadoop Programming on the Hortonworks Data Platform for Managers Training Chapter 1. MapReduce Overview The Client – Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce The Reduce Phase of MapReduce MapReduce Explained MapReduce Word Count Job MapReduce Shared-Nothing Architecture Similarity with SQL Aggregation Operations Example of Map & Reduce Operations using JavaScript Problems Suitable for Solving with MapReduce Typical MapReduce Jobs Fault-tolerance of MapReduce Distributed Computing Economics MapReduce Systems Summary Chapter 2. Hadoop Overview Apache Hadoop Apache Hadoop Logo Typical Hadoop Applications Hadoop Clusters Hadoop Design Principles Hadoop Versions Hadoop's Main Components Hadoop Simple Definition Side-by-Side Comparison: Hadoop 1 and Hadoop 2 Hadoop-based Systems for Data Analysis Other Hadoop Ecosystem Projects Hadoop Caveats Hadoop Distributions Cloudera Distribution of Hadoop (CDH) Cloudera Distributions Hortonworks Data Platform (HDP) MapR Summary Chapter 3. Hadoop Distributed File System Overview Hadoop Distributed File System (HDFS) HDFS High Availability HDFS "Fine Print" Storing Raw Data in HDFS Hadoop Security HDFS Rack-awareness Data Blocks Data Block Replication Example HDFS NameNode Directory Diagram Accessing HDFS Examples of HDFS Commands Other Supported File Systems WebHDFS Examples of WebHDFS Calls Client Interactions with HDFS for the Read Operation Read Operation Sequence Diagram Client Interactions with HDFS for the Write Operation Communication inside HDFS Summary Chapter 4. Apache Pig Scripting Platform What is Pig? Pig Latin Apache Pig Logo Pig Execution Modes Local Execution Mode MapReduce Execution Mode Running Pig Running Pig in Batch Mode What is Grunt? Pig Latin Statements Pig Programs Pig Latin Script Example SQL Equivalent Differences between Pig and SQL Statement Processing in Pig Comments in Pig Supported Simple Data Types Supported Complex Data Types Arrays Defining Relation's Schema Not Matching the Defined Schema The bytearray Generic Type Using Field Delimiters Loading Data with TextLoader() Referencing Fields in Relations Summary Chapter 5. Apache Pig HDFS Interface The HDFS Interface FSShell Commands (Short List) Grunt's Old File System Commands Summary Chapter 6. Apache Pig Relational and Eval Operators Pig Relational Operators Example of Using the JOIN Operator Example of Using the Order By Operator Caveats of Using Relational Operators Pig Eval Functions Caveats of Using Eval Functions (Operators) Example of Using Single-column Eval Operations Example of Using Eval Operators For Global Operations Summary Chapter 7. Hive What is Hive? Apache Hive Logo Hive's Value Proposition Who uses Hive? Hive's Main Sub-Systems Hive Features The "Classic" Hive Architecture The New Hive Architecture HiveQL Where are the Hive Tables Located? Hive Command-line Interface (CLI) The Beeline Command Shell Summary Chapter 8. Hive Command-line Interface Hive Command-line Interface (CLI) The Hive Interactive Shell Running Host OS Commands from the Hive Shell Interfacing with HDFS from the Hive Shell The Hive in Unattended Mode The Hive CLI Integration with the OS Shell Executing HiveQL Scripts Comments in Hive Scripts Variables and Properties in Hive CLI Setting Properties in CLI Example of Setting Properties in CLI Hive Namespaces Using the SET Command Setting Properties in the Shell Setting Properties for the New Shell Session Setting Alternative Hive Execution Engines The Beeline Shell Connecting to the Hive Server in Beeline Beeline Command Switches Beeline Internal Commands Summary Chapter 9. Hive Data Definition Language Hive Data Definition Language Creating Databases in Hive Using Databases Creating Tables in Hive Supported Data Type Categories Common Numeric Types String and Date / Time Types Miscellaneous Types Example of the CREATE TABLE Statement Working with Complex Types Table Partitioning Table Partitioning Table Partitioning on Multiple Columns Viewing Table Partitions Row Format Data Serializers / Deserializers File Format Storage File Compression More on File Formats The ORC Data Format Converting Text to ORC Data Format The EXTERNAL DDL Parameter Example of Using EXTERNAL Creating an Empty Table Dropping a Table Table / Partition(s) Truncation Alter Table/Partition/Column Views Create View Statement Why Use Views? Restricting Amount of Viewable Data Examples of Restricting Amount of Viewable Data Creating and Dropping Indexes Describing Data Summary Chapter 10. Hive Data Manipulation Language Hive Data Manipulation Language (DML) Using the LOAD DATA statement Example of Loading Data into a Hive Table Loading Data with the INSERT Statement Appending and Replacing Data with the INSERT Statement Examples of Using the INSERT Statement Multi Table Inserts Multi Table Inserts Syntax Multi Table Inserts Example Summary Chapter 11. Apache Sqoop What is Sqoop? Apache Sqoop Logo Sqoop Import / Export Sqoop Help Examples of Using Sqoop Commands Data Import Example Fine-tuning Data Import Controlling the Number of Import Processes Data Splitting Helping Sqoop Out Example of Executing Sqoop Load in Parallel A Word of Caution: Avoid Complex Free-Form Queries Using Direct Export from Databases Example of Using Direct Export from MySQL More on Direct Mode Import Changing Data Types Example of Default Types Overriding File Formats The Apache Avro Serialization System Binary vs Text More on the SequenceFile Binary Format Generating the Java Table Record Source Code Data Export from HDFS Export Tool Common Arguments Data Export Control Arguments Data Export Example Using a Staging Table INSERT and UPDATE Statements INSERT Operations UPDATE Operations Example of the Update Operation Failed Exports Sqoop2 Sqoop2 Architecture Summary Chapter 12. Introduction to Apache Spark What is Apache Spark A Short History of Spark Where to Get Spark? The Spark Platform Spark Logo Common Spark Use Cases Languages Supported by Spark Running Spark on a Cluster The Driver Process Spark Applications Spark Shell The spark-submit Tool The spark-submit Tool Configuration The Executor and Worker Processes The Spark Application Architecture Interfaces with Data Storage Systems Limitations of Hadoop's MapReduce Spark vs MapReduce Spark as an Alternative to Apache Tez The Resilient Distributed Dataset (RDD) Spark Streaming (Micro-batching) Spark SQL Example of Spark SQL Spark Machine Learning Library GraphX Spark vs R Summary Chapter 13. The Spark Shell The Spark Shell The Spark Shell UI Spark Shell Options Getting Help The Spark Context (sc) and SQL Context (sqlContext) The Shell Spark Context Loading Files Saving Files Basic Spark ETL Operations Summary Lab Exercises Lab 1. Learning the Lab EnvironmentLab 2. Getting Started with Apache AmbariLab 3. The Hadoop Distributed File SystemLab 4. Getting Started with Apache Pig Lab 5. Working with Data Sets in Apache Pig Lab 6. The Hive and Beeline Shells Lab 7. Hive Data Definition Language Lab 8. The Spark Shell We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC. View Course Outline Share This Request On-Site or Customized Course Info Lab Setup Guide REGISTER FOR A COURSEWARE SAMPLE x Sent First Name Last Name Email Request On-Site or Customized Course Info x Sent First Name Last Name Phone Number Company Name Email Question