Training

 

 

Popular Courses

Browse Our Free Resources

  • whitepapers
  • whitepapers
  • webinars
  • blogs

Our Locations

Training Centres

Vancouver, BC
Calgary, AB
Edmonton, AB
Toronto, ON
Ottawa, ON
Montreal, QC
Hunt Valley
Columbia

locations map

Calgary

550 6th Av SW
Suite 475
Calgary, AB
T2P 0S2

Toronto

821A Bloor Street West
Toronto, ON
M6G 1M1

Vancouver

409 Granville St
Suite 902
Vancouver, BC
V6C 1T2

U.S. Office

436 York Road
Suite 1
Jenkintown, PA
19046

Other Locations

Dallas, TX
Miami, FL

Home > Training > Big Data > Machine Learning with Apache Spark Training

Machine Learning with Apache Spark Training

Quick Enroll

Course#: WA2610
Courseware: Available for sale

To stay competitive, organizations have started adopting new approaches to data processing and analysis.  For example, data scientists are turning to Apache Spark for processing massive amounts of data using Apache Spark’s distributed compute capability and its built-in machine learning library.

This intensive Apache Spark training course provides an overview of data science algorithms as well as the theoretical and technical aspects of using the Apache Spark platform for Machine Learning.  This training course is supplemented by a variety of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.

TOPICS

  • Applied Data Science and Business Analytics
  • Machine Learning Algorithms, Techniques and Common Analytical Methods
  • Apache Spark Introduction
  • Spark’s MLlib Machine Learning Library

This Apache Spark training course has 3 hands-on labs that are outlined at the bottom of this page. The labs cover the spark-submit tool as well as Apache Spark shell. The labs allow you to practice the following skills:

Lab 1 - Using the spark-submit Tool

Spark offers developers two ways of running your applications:

  • Using the spark-submit tool
  • Using Spark Shell

In this lab, we will review what is involved in using the spark-submit tool.

Lab 2 - The Apache Spark Shell

Interactive development environment in Spark is provided by the Spark Shell (also known as REPL: Read/Eval/Print Loop tool) that is available for Scala and Python developers (Java is not yet supported).
The lab instructions below apply to the Scala version of the Spark Shell.

Lab 3 - Using Random Forests for Classification with Spark MLlib

In this lab, we will learn how to use Random Forests implementation of the algorithm from Spark's Machine Learning library, MLlib, to perform object classification.
Random Forests algorithm is regarded as one of the most successful supervised learning algorithm that can be used for both classification and regression.
In our work we will use the Python version of the library, which provides API similar to those implemented in Scala and Java.
We will also use the spark-submit Spark tool to submit the application from command line rather than typing in commands in Spark Shell.

Web Age Spark class can be delivered in traditional classroom style format. This Apache Spark Training can also be delivered in a synchronous instructor led format.

AUDIENCE

Data Scientists, Business Analysts, Software Developers, IT Architects

PREREQUISITES

Participants should have the general knowledge of statistics and programming

DURATION

1 Day

Outline of WA2610 Machine Learning with Apache Spark Training

Chapter 1. Machine Learning Algorithms

  • Supervised vs Unsupervised Machine Learning
  • Supervised Machine Learning Algorithms
  • Unsupervised Machine Learning Algorithms
  • Choose the Right Algorithm
  • Life-cycles of Machine Learning Development
  • Classifying with k-Nearest Neighbors (SL)
  • k-Nearest Neighbors Algorithm
  • k-Nearest Neighbors Algorithm
  • The Error Rate
  • Decision Trees (SL)
  • Random Forests
  • Unsupervised Learning Type: Clustering
  • K-Means Clustering (UL)
  • K-Means Clustering in a Nutshell
  • Regression Analysis
  • Logistic Regression
  • Summary

Chapter 2. Introduction to Functional Programming

  • What is Functional Programming (FP)?
  • Terminology: Higher-Order Functions
  • Terminology: Lambda vs Closure
  • A Short List of Languages that Support FP
  • FP with Java
  • FP With JavaScript
  • Imperative Programming in JavaScript
  • The JavaScript map (FP) Example
  • The JavaScript reduce (FP) Example
  • Using reduce to Flatten an Array of Arrays (FP) Example
  • The JavaScript filter (FP) Example
  • Common High-Order Functions in Python
  • Common High-Order Functions in Scala
  • Elements of FP in R
  • Summary

Chapter 3. Introduction to Apache Spark

  • What is Apache Spark
  • A Short History of Spark
  • Where to Get Spark?
  • The Spark Platform
  • Spark Logo
  • Common Spark Use Cases
  • Languages Supported by Spark
  • Running Spark on a Cluster
  • The Driver Process
  • Spark Applications
  • Spark Shell
  • The spark-submit Tool
  • The spark-submit Tool Configuration
  • The Executor and Worker Processes
  • The Spark Application Architecture
  • Interfaces with Data Storage Systems
  • Limitations of Hadoop's MapReduce
  • Spark vs MapReduce
  • Spark as an Alternative to Apache Tez
  • The Resilient Distributed Dataset (RDD)
  • Spark Streaming (Micro-batching)
  • Spark SQL
  • Example of Spark SQL
  • Spark Machine Learning Library
  • GraphX
  • Spark vs R
  • Summary

Chapter 4. The Spark Shell

  • The Spark Shell
  • The Spark Shell UI
  • Spark Shell Options
  • Getting Help
  • The Spark Context (sc) and SQL Context (sqlContext)
  • The Shell Spark Context
  • Loading Files
  • Saving Files
  • Basic Spark ETL Operations
  • Summary

Chapter 5. The Spark Machine Learning Library

  • What is MLlib?
  • Supported Languages
  • MLlib Packages
  • Dense and Sparse Vectors
  • Labeled Point
  • Python Example of Using the LabeledPoint Class
  • LIBSVM format
  • An Example of a LIBSVM File
  • Loading LIBSVM Files
  • Local Matrices
  • Example of Creating Matrices in MLlib
  • Distributed Matrices
  • Example of Using a Distributed Matrix
  • Classification and Regression Algorithm
  • Clustering
  • Summary

Chapter 6. Text Mining

  • What is Text Mining?
  • The Common Text Mining Tasks
  • What is Natural Language Processing (NLP)?
  • Some of the NLP Use Cases
  • Machine Learning in Text Mining and NLP
  • Machine Learning in NLP
  • TF-IDF
  • The Feature Hashing Trick
  • Stemming
  • Example of Stemming
  • Stop Words
  • Popular Text Mining and NLP Libraries and Packages
  • Summary

Lab Exercises

Lab 1. Learning the Lab Environment
Lab 2. The Spark Shell
Lab 3. Using Random Forests for Classification with Spark MLlib
Lab 4. Using k-means Algorithm from MLlib
Lab 5. Text Classification with Spark ML Pipeline

Address Start Date End Date
Instructor Led Virtual 08/28/2017 08/28/2017
Instructor Led Virtual 09/06/2017 09/06/2017
We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC.
*Your name:

*Your e-mail:

*Phone:

*Company name:

Additional notes:

We have received your message. A sales representative will contact you soon.

Thank you!.

more details
buy this course

08/28/2017 - Online Virtual
$695.00
Enroll

09/06/2017 - Online Virtual
$695.00
Enroll

Other Details

Register for a courseware sample

It's simple, and free.

 

Thank You!

You will receive an email shortly containing a link to download the requested sample of the labs for this course.