Providing Technology Training and Mentoring For Modern Technology Adoption
Web Age Aniversary Logo
US Inquiries / 1.877.517.6540
Canadian Inquiries / 1.877.812.8887
Course #:WA2610

Introduction to Machine Learning Using Spark Training

Courseware: Available for sale

To stay competitive, organizations have started adopting new approaches to data processing and analysis.  For example, data scientists are turning to Apache Spark for processing massive amounts of data using Apache Spark’s distributed compute capability and its built-in machine learning library.

This intensive Apache Spark training course provides an overview of data science algorithms as well as the theoretical and technical aspects of using the Apache Spark platform for Machine Learning.  This training course is supplemented by a variety of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.

We also offer R Programming Machine Learning Course which has been steadily gaining popularity with business analysts, statisticians and data scientists as a tool of choice for conducting statistical analysis of data as well as supervised and unsupervised machine learning.

Sign up today for one of our instructor led Machine Learning Classes and Learn Machine Learning now.

Our Machine Learning courses continues to be in high demand. The popular courses are:

  • Machine Learning with Apahe Spark Training
  • Learn Data Science, Statistics, and Machine Learning using Python
  • R Programming Training

Our Machine Learning Course, Machine Learning with Apache Spark Training, covers following topics:

  • Machine learning algorithms
  • Introduction to functional programming
  • Introduction to Apache Spark
  • The Spark Shell
  • The Spark Machine Learning Library
  • Text mining

Our Machine Learning Course, R Programming Training, covers following topics:

  • Working with R
  • R Syntax
  • R Data Structures
  • Functions
  • Control Statements
  • Input / Output
  • Data Import and Export
  • R Statistical Computing Features
  • Data Visualization
  • Data Science Algorithms and Analytical Methods

You can also Learn Machine Learning from our popular machine learning webinars:

Web Age Machine Learning Training can be delivered in traditional classroom style format. You can also Learn Machine Learning via our courses delivered in a synchronous instructor led format.


  • Applied Data Science and Business Analytics
  • Machine Learning Algorithms, Techniques and Common Analytical Methods
  • Apache Spark Introduction
  • Spark’s MLlib Machine Learning Library

This Apache Spark training course has 3 hands-on labs that are outlined at the bottom of this page. The labs cover the spark-submit tool as well as Apache Spark shell. The labs allow you to practice the following skills:

Lab 1 - Using the spark-submit Tool

Spark offers developers two ways of running your applications:

  • Using the spark-submit tool
  • Using Spark Shell

In this lab, we will review what is involved in using the spark-submit tool.

Lab 2 - The Apache Spark Shell

Interactive development environment in Spark is provided by the Spark Shell (also known as REPL: Read/Eval/Print Loop tool) that is available for Scala and Python developers (Java is not yet supported).
The lab instructions below apply to the Scala version of the Spark Shell.

Lab 3 - Using Random Forests for Classification with Spark MLlib

In this lab, we will learn how to use Random Forests implementation of the algorithm from Spark's Machine Learning library, MLlib, to perform object classification.
Random Forests algorithm is regarded as one of the most successful supervised learning algorithm that can be used for both classification and regression.
In our work we will use the Python version of the library, which provides API similar to those implemented in Scala and Java.
We will also use the spark-submit Spark tool to submit the application from command line rather than typing in commands in Spark Shell.

Web Age Spark class can be delivered in traditional classroom style format. This Apache Spark Training can also be delivered in a synchronous instructor led format.


Data Scientists, Business Analysts, Software Developers, IT Architects


Participants should have the general knowledge of statistics and programming


1 Day

Outline of Introduction to Machine Learning Using Spark Training

Chapter 1. Machine Learning Algorithms

  • Supervised vs Unsupervised Machine Learning
  • Supervised Machine Learning Algorithms
  • Unsupervised Machine Learning Algorithms
  • Choose the Right Algorithm
  • Life-cycles of Machine Learning Development
  • Classifying with k-Nearest Neighbors (SL)
  • k-Nearest Neighbors Algorithm
  • k-Nearest Neighbors Algorithm
  • The Error Rate
  • Decision Trees (SL)
  • Random Forests
  • Unsupervised Learning Type: Clustering
  • K-Means Clustering (UL)
  • K-Means Clustering in a Nutshell
  • Regression Analysis
  • Logistic Regression
  • Summary

Chapter 2. Introduction to Functional Programming

  • What is Functional Programming (FP)?
  • Terminology: Higher-Order Functions
  • Terminology: Lambda vs Closure
  • A Short List of Languages that Support FP
  • FP with Java
  • FP With JavaScript
  • Imperative Programming in JavaScript
  • The JavaScript map (FP) Example
  • The JavaScript reduce (FP) Example
  • Using reduce to Flatten an Array of Arrays (FP) Example
  • The JavaScript filter (FP) Example
  • Common High-Order Functions in Python
  • Common High-Order Functions in Scala
  • Elements of FP in R
  • Summary

Chapter 3. Introduction to Apache Spark

  • What is Apache Spark
  • A Short History of Spark
  • Where to Get Spark?
  • The Spark Platform
  • Spark Logo
  • Common Spark Use Cases
  • Languages Supported by Spark
  • Running Spark on a Cluster
  • The Driver Process
  • Spark Applications
  • Spark Shell
  • The spark-submit Tool
  • The spark-submit Tool Configuration
  • The Executor and Worker Processes
  • The Spark Application Architecture
  • Interfaces with Data Storage Systems
  • Limitations of Hadoop's MapReduce
  • Spark vs MapReduce
  • Spark as an Alternative to Apache Tez
  • The Resilient Distributed Dataset (RDD)
  • Spark Streaming (Micro-batching)
  • Spark SQL
  • Example of Spark SQL
  • Spark Machine Learning Library
  • GraphX
  • Spark vs R
  • Summary

Chapter 4. The Spark Shell

  • The Spark Shell
  • The Spark Shell UI
  • Spark Shell Options
  • Getting Help
  • The Spark Context (sc) and SQL Context (sqlContext)
  • The Shell Spark Context
  • Loading Files
  • Saving Files
  • Basic Spark ETL Operations
  • Summary

Chapter 5. The Spark Machine Learning Library

  • What is MLlib?
  • Supported Languages
  • MLlib Packages
  • Dense and Sparse Vectors
  • Labeled Point
  • Python Example of Using the LabeledPoint Class
  • LIBSVM format
  • An Example of a LIBSVM File
  • Loading LIBSVM Files
  • Local Matrices
  • Example of Creating Matrices in MLlib
  • Distributed Matrices
  • Example of Using a Distributed Matrix
  • Classification and Regression Algorithm
  • Clustering
  • Summary

Chapter 6. Text Mining

  • What is Text Mining?
  • The Common Text Mining Tasks
  • What is Natural Language Processing (NLP)?
  • Some of the NLP Use Cases
  • Machine Learning in Text Mining and NLP
  • Machine Learning in NLP
  • TF-IDF
  • The Feature Hashing Trick
  • Stemming
  • Example of Stemming
  • Stop Words
  • Popular Text Mining and NLP Libraries and Packages
  • Summary

Lab Exercises

Lab 1. Learning the Lab Environment
Lab 2. The Spark Shell
Lab 3. Using Random Forests for Classification with Spark MLlib
Lab 4. Using k-means Algorithm from MLlib
Lab 5. Text Classification with Spark ML Pipeline

We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC.
US Inquiries / 1.877.517.6540
Canadian Inquiries / 1.877.812.8887