Providing Technology Training and Mentoring For Modern Technology Adoption
Web Age Aniversary Logo
US Inquiries / 1.877.517.6540
Canadian Inquiries / 1.877.812.8887
Course #:WA2950

Data Engineering with PySpark Training (Coming Soon)

Data Engineering with PySpark

Audience

Data Warehouse and Data Lake Specialists, Software Developers

Prerequisites

General background in programming and/or data processing; ability to learn a new language (Python) by doing stepwise exercises

Duration

Three days

Outline of Data Engineering with PySpark Training

Chapter 1. Defining Data Engineering

  • What is Data Engineering?
  • How is it different from Data Science?

Chapter 2. The Data Engineer Role

  • The scope of the DE role
  • Data Scientists, Machine Learning Specialists, and Data Engineers

Chapter 3. Data Processing Phases

  • Data Ingestion
  • Data Cleansing

Chapter 4. Distributed Computing Concepts

  • Data Physics
  • CAP Theorem
  • Hadoop

Chapter 5. Apache Spark

  • Supported Languages
  • Distributed Data Processing with PySpark

Chapter 6. Apache Spark Dev Environments

  • Spark Shells
  • Jupyter Notebooks

Chapter 7. Introduction to Functional Programming

  • Why I need Functional Programming?
  • Functional Programming with Python

Chapter 8. Functional Programming using Spark RDD API

  • RDD Transformations and Actions
  • Data Partitioning

Chapter 9. ETL Jobs with RDD

  • Using map-reduce FP for Data Processing

Chapter 10. Spark SQL DataFrames

  • What are DataFrames?
  • Relationship with RDDs
  • Ways to Create DataFrames
  • Schema of Datasets
  • Inferring the Schema

Chapter 11. SQL-centric Programming using DataFrames API

  • Using the sql Method, and the Native DataFrame API
  • Data Aggregation

Chapter 12. ETL Jobs with DataFrames

  • Using Spark SQL DataFrame API
  • Contrasting with Spark RDD API

Chapter 13. Repairing and Normalizing Data

  • What May Be Wrong With My Data?
  • Detecting and Removing Bad Data

Chapter 14. Data Visualization with seaborn

  • EDA
  • Available Options

Chapter 15. Working with Various File Formats: CSV, Parquet, ORC, and JSON0

  • What is Columnar Data Storage Formats?
  • Comparing Various Formats
  • Ways to Read and Store Data in Various Formats
We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC.
US Inquiries / 1.877.517.6540
Canadian Inquiries / 1.877.812.8887