Home  > Resources  > Blog

Spark and ML at Scale

August 22, 2023 by Denis Vrdoljak
Category: Big Data

These course chapters have been adapted from the Web Age course WA3290: Spark and Machine Learning at Scale Training to introduce you to the world of Spark, a powerful open-source big data processing engine used to create scalable machine learning solutions.

Contact us for the full version of this live, hands-on course taught by an expert instructor.

Chapter 8 – Introduction to Machine Learning at Scale.

Key objectives of this chapter:

  • Introduction to Scalability
  • Challenges in ML Scalability
  • Common reasons for scaling up ML systems
  • How to avoid scaling Infrastructure?
  • Benefits of ML at Scale
  • Challenges in ML Scalability

8.1 Introduction to Scalability

Scalability refers to a system’s ability to handle increased or decreased
load such that it responds swiftly to changes in applications and system
processing requirements.

Machine learning scalability refers to scaling ML applications that can
handle any amount of data and perform many computations in a cost-effective
and time-saving ways to instantly serve millions of users.

8.2 Introduction to Scalability

Data teams are expected to build scalable applications that:

  • Work for millions of users
  • Residing in millions of locations
  • Work at a reasonable speed

8.3 Common Reasons for Scaling Up ML Systems

Scaling up ML system is sometimes necessary and the common reasons are:

  • The training data doesn’t fit on a single machine.
  • The time to train a model is too long.
  • The volume of data coming in is too high.
  • The latency requirements for predictions are low.

8.4 How to Avoid Scaling Infrastructure?

You can avoid spending time and resources on a scalable infrastructure by:

  • Choosing a different ML algorithm
  • Subsampling the data
  • Scaling up vertically (upgrading the machine)
  • Sacrificing accuracy or easing other constraints

8.5 Benefits of ML at Scale

  • Promotes ML automation and reduce cost
  • Enhances modularization and team collaboration
  • Boosts productivity
  • Boosts model performance

8.6 Challenges in ML Scalability

When implementing Scalable ML infrastructure, there are some challenges that you might face:

  • Data Complexities
  • ML System Engineering
  • Integration Risks
  • Collaboration issues

8.7 Data Complexities – Challenges

  • Data fuels machine learning
    • ML model training is expensive and challenged by data complexities.
    • You need to make sure that data is feasible and predictable.
    • What makes data complex:
      • Growth rate
      • Size
      • Type
      • Structure
      • Detail
      • Query language
      • Dispersed

8.8 ML System Engineering – Challenges

  • A scalable ML system needs to be engineered with specific requirements.
  • Choosing a suitable infrastructure and technical stack is crucial.
  • Inappropriate design of ML solution incurs more cost.
  • Example:
    • Data Scientist may use tools like Pandas and code in Python
    • Spark and PySpark may be more desirable.

8.9 Integration Risks – Challenges

  • Scaling ML project requires a scalable production environment well
  • integrated with modeling technologies.
  • ML at scale requires proper integration between various teams.
  • Pay attention to:
    • Workflow automation
    • Process Standardization
    • Testing practices

8.10 Collaboration Issues – Challenges

  • Maintaining transparent communication between:
  • Data Science team
  • Data Engineering team
  • DevOps team
  • Other relevant team(s)
  • Assigning roles, giving detailed access, and monitoring every team is complex.

Chapter 9 – Machine learning at Scale – Distributed Training of Machine Learning models

Key objectives in this chapter include:

  • Introduction to Distributed Training
  • Data Parallelism
  • Model Parallelism
  • Distributed Training
  • Distributed Inference
  • Inference Challenges
  • GPU for Training
  • GPU for Inference
  • AWS Inferentia Chip

9.1 Introduction to Distributed Training

  • There are two main types of Distributed Training:
    • Data parallelism
    • Model Parallelism

9.2 Data Parallelism

  • The entire model is deployed to multiple cluster nodes, and the data is horizontally split.
  • Each instance of the model works on a part of the data.

9.3 Steps of Data Parallelism

  • A data parallelism frameworks do mainly the following three tasks:
    • It creates and dispatches copies of the model.
    • It shards the data and then distributes it to the corresponding devices.
    • It finally aggregates all results together in the backpropagation step.

9.4 Data Parallelism vs Random Forest

9.5 Model Parallelism

  • It is segmented into different parts that can run concurrently in different nodes.
  • Runs each node on the same data.
  • The scalability depends on the parallelization of the algorithm.
  • It is more complex to implement than data parallelism.

9.6 Frameworks for Implementing Distributed ML

Some of the frameworks that implement Distributed ML:

  • Apache Spark
  • Baidu All Reduce
  • Horovod
  • ◊ Caffe2
  • Microsoft Cognitive Tool Kit (CNTK)
  • DistBelief
  • Tensorflow
  • PyTorch

9.7 Introduction to Distributed Training vs Distributed Inference

Machine Learning works in two main phases:

  • Training
  • Inference
image: https://www.exxactcorp.com/blog/HPC/discover-the-difference-between-deep-learningtraining-

9.8 Introduction to Training

  • What is Training?
    • Training refers to the process of using a machine learning algorithm to build a model.
  • Training involves:
    • Algorithm / a deep learning framework
    • Training dataset
Training vs. Inference

9.9 Introduction to Inference

What is Inference?

  • Inference is the process of using a trained model to produce an actionable output
  • It usually happens live
  • Examples:
    • Speech recognition
    • Real-time language translation
    • Machine vision

9.10 Key Components of Inference

  • Main steps and components of the Inference system:
    • Accepts inputs from end-users
    • Processes the data
    • Feeds it into the ML model
    • Serves outputs back to users

9.11 Inference Challenges

  • There are three primary challenges when setting up ML Inference:
    • Latency
    • Interoperability
    • Infrastructure Cost

9.12 Inference Challenges – Latency

  • A common requirement for Inference systems is latency:
    • Mission-critical applications often require real-time inference.
    • Examples:
      • Autonomous navigation
      • Critical material handling
      • Medical equipment
    • Some use cases can tolerate higher latency
      • You can run these analyses in batches.

9.13 Inference Challenges – Interoperability

  • Challenges related to Interoperability:
    • Different teams use different frameworks to solve problems.
      • Tensorflow
      • PyTorch
      • Keras
    • When in production Inference, these models need to play well together.
    • Different environments for models to run
      • Client devices
      • In the cloud
    • Containerization is a common practice to solve such problems.

9.14 Inference Challenges – Infrastructure Cost

  • Challenges related to Infrastructure Cost:
    • The cost of Inference is a key factor in effective ML models
    • ML models are often computationally intensive.
    • Minimizing the cost per inference.
      • One solution: Run queries concurrently or in batches.

9.15 Training vs. Inference

  • Training is more compute-intensive than inference.
    • Both training and inference require different workloads.
  • Training is s a building block for inference.
  • Estimated split between training and inference in commercial AL instances is on the side of Inference. (source: TIRIAS Research)
Estimated split between training and inference in commercial instances

9.16 Introduction to GPUs

  • Machine Learning Training and Inference may use:
    • GPU for Training
    • GPU for Inference
    • AWS Inferentia Chip

9.17 GPU

  • The graphics processing unit (GPU) is a specialized hardware component capable of performing many fundamental tasks simultaneously.
  • Invented by Nvidia in 1999.
  • They are designed for parallel processing.
  • GPUs are used:
    • Graphics
    • Video rendering applications
    • Artificial Intelligence

9.18 GPU for Training

  • Why should we use GPU for ML Training?
    • GPU can save time on model training.
    • It allows you to execute models with a large number of parameters.
    • It allows you to parallelize your training.
    • It allows you to perform multiple computing operations at the same time.

9.19 GPU for Training

  • The decisions to integrate GPUs in your ML pipeline:
    • Memory bandwidth: GPUs offer the necessary bandwidth to support big datasets.
      • It has specialized video RAM (VRAM) that allows you to save CPU for other operations.
    • Dataset size: GPUs can scale more readily than CPU
      • The more data you get, the more advantage you may get from GPUs
    • Optimization: GPUs might be more difficult to optimize than with CPUs

9.20 GPU for Inference

  • Critical decision criteria to integrate GPUs for Inference:
    • Speed
  • If a model can’t analyze data quickly enough, it can’t be used in practice.
    • Cost
  • The model becomes too expensive if it consumes too much of energy.
    • Accuracy
  • Without accuracy, there is no usage of Data Science and ML.

9.21 Inference – Hardware

  • When choosing hardware for Inference, consider the following:
    • How critical is it that your Inference performance in good?
    • Should you maximize latency or throughput?
    • Batch size of data is large or not?
    • What about financial sacrifice to get better results?

9.22 Inference – Hardware

  • Suggestions:
    • If speed is not an issue, choose CPU.
    • When inference speed becomes a bottleneck in the application, upgrade to GPU.
    • Target throughput, latency, and cost.
      • Deliver good customer experience on a budget.
    • CPU would be a good fit for most ML algorithms.
      • CPUs still offer a compelling level of performance for Inference applications.
    • Keep your GPU utilization at its maximum at all times.
      • Due to sporadic inference requests, your cost inference request goes up.

9.23 AWS Inferentia Chip

  • AWS Inferentia Chip is Amazon’s custom chip for Machine Learning
    • Introduced in 2018
    • According to Amazon:
      • “AWS Inferentia provides high throughput, low latency inference performance at an extremely low cost. Each chip provides hundreds of TOPS (terra operations per second) of inference throughput to allow complex models to make fast predictions. For even more performance, multiple AWS Inferentia chips can be used together to drive thousands of TOPS of throughput.”

9.24 AWS Inferentia Chip vs GPU

  • “Compared to GPU-based instances, Inferentia has led to a 25% lower end-to-end latency, and 30% lower cost for Alexa’s text-to-speech (TTS) workloads.” by Amazon
  • Test done by Amazon using pretrained BERT base models.
  • Results: AWS Inferentia vs GPUs:
    • 12 times higher throughput
    • 70% lower cost
Batch inferences vs. Real time inference

Contact us for the full version of this live, hands-on Spark and ML at Scale training course taught by an expert instructor.

Follow Us

Blog Categories