Browse Our Free Resources
550 6th Av SW
439 University Av
409 Granville St
436 York Road
• Spark Machine Learning Library (MLlib) provides an array of high quality distributed Machine Learning (ML) algorithms
• The MLlib library implements a whole suite of statistical and machine learning algorithms (see Notes for details)
• MLlib provides tools for
• Building processing workflows (e.g. feature extraction and data transformation),
• Parameter optimization, and
• ML model management for model saving and loading
• MLlib applications run on top of Spark and take full advantage of Spark's distributed in-memory design
• MLlib applications claim 10X+ faster performance for applications that implement similar algorithms created using Apache
• Apache Mahout apps leverage Hadoop's MapReduce engine
Machine Learning Algorithms in Apache Spark
WA2610 Machine Learning with Apache Spark
This intensive Apache Spark training course provides an overview of data science algorithms as well as the theoretical and technical aspects of using the Apache Spark platform for Machine Learning. This training course is supplemented by a variety of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.
• The following options are available for running Spark applications on a cluster:
• Spark Stand-alone - Spark's own cluster management system
• Limited in terms of configuration options and scalability
• External cluster management systems (the preferred option for large processing jobs):
• Hadoop's YARN
• Running Spark using a cluster management system aids in computing efficiency, fault-tolerance, and scalability of your data processing solutions
• For development and prototyping, you can run Spark on a single (local) machine (without distributed processing capabilities)
• In all scenarios there is a Driver program (your Spark application or a Spark Shell session) which creates a Spark Context pointing to the Spark Master
To Spark or Not to Spark?
WA2490 Spark Fundamentals
This high-octane Spark training course provides theoretical and technical aspects of Spark programming. The course teaches developers Spark fundamentals, APIs, common programming idioms and more. This Spark training course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the learned material and quickly get them up to speed on using Spark for data exploration.
• R is a programming language and environment used for statistical computing and data analysis (http://www2.webagesolutions.com/e/7422/2017-06-08/5n8r4l/664559953)
• Distributed under the GNU General Public License
• Widely used by statisticians and data miners
• R is supported by a very active user community
• More than five thousand additional packages available at the Comprehensive R Archive Network (CRAN) and other repositories
• R is an interpreted implementation of the S statistical computing language with elements borrowed from the Scheme language
• For computationally intensive tasks, R can leverage C/C++ and FORTRAN routines that can be linked to R and called at run time
• In addition to the command line interface, R has several GUI environments, the primary GUI is shipped with R itself
• R supports the production of publication-quality statistical graphs
Using R as a tool for Business Analytics
WA2324 R Programming
This intensive training course helps students learn the practical aspects of the R programming language. The course is supplemented by many hands-on labs which allow attendees to immediately apply their theoretical knowledge in practice.
Copyright © 2012-2016 Web Age Solutions Inc.