- Applied Data Science and Business Analytics
- Machine Learning Algorithms, Techniques and Common Analytical Methods
- Apache Spark Introduction
- Spark’s MLlib Machine Learning Library
This Apache Spark training course has 3 hands-on labs that are outlined at the bottom of this page. The labs cover the spark-submit tool as well as Apache Spark shell. The labs allow you to practice the following skills:
Lab 1 - Using the spark-submit Tool
Spark offers developers two ways of running your applications:
- Using the spark-submit tool
- Using Spark Shell
In this lab, we will review what is involved in using the spark-submit tool.
Lab 2 - The Apache Spark Shell
Interactive development environment in Spark is provided by the Spark Shell (also known as REPL: Read/Eval/Print Loop tool) that is available for Scala and Python developers (Java is not yet supported).
The lab instructions below apply to the Scala version of the Spark Shell.
Lab 3 - Using Random Forests for Classification with Spark MLlib
In this lab, we will learn how to use Random Forests implementation of the algorithm from Spark's Machine Learning library, MLlib, to perform object classification.
Random Forests algorithm is regarded as one of the most successful supervised learning algorithm that can be used for both classification and regression.
In our work we will use the Python version of the library, which provides API similar to those implemented in Scala and Java.
We will also use the spark-submit Spark tool to submit the application from command line rather than typing in commands in Spark Shell.
Web Age Spark class can be delivered in traditional classroom style format. This Apache Spark Training can also be delivered in a synchronous instructor led format.
Data Scientists, Business Analysts, Software Developers, IT Architects
Participants should have the general knowledge of statistics and programming