Course #:WA2922

Data Engineering for Managers Training

1/2 Day

Outline of Data Engineering for Managers Training

Chapter 1. Defining Data Engineering

  • Data is King
  • What is Data Engineering
  • The Data-Related Roles
  • The Data Engineer Role
  • Core Skills and Competencies
  • What is Data Wrangling (Munging)?
  • Typical Data Processing Pipeline
  • Data Discovery Phase
  • Data Harvesting Phase
  • Data Priming Phase
  • Exploratory Data Analysis
  • Model Planning Phase
  • Model Building Phase
  • Communicating the Results
  • Production Roll-out
  • Data Logistics and Data Governance
  • Data Processing Workflow Engines
  • Data Lineage and Provenance
  • The Traditional Client–Server Processing Pattern
  • Enter Distributed Computing
  • Data Physics
  • Data Locality (Distributed Computing Economics)
  • The CAP Theorem
  • Mechanisms to Guarantee a Single CAP Property
  • Eventual Consistency
  • What is Apache Spark
  • The Spark Platform
  • Languages Supported by Spark
  • Running Spark on a Cluster
  • The Resilient Distributed Dataset (RDD)
  • The Lineage Concept
  • Datasets and DataFrames
  • Data Partitioning
  • Data Partitioning Diagram
  • Python's Value
  • Python on AWS
  • What is Serverless Computing?
  • How Functions Work
  • What is AWS Glue?
  • Summary
