Providing Technology Training and Mentoring For Modern Technology Adoption
Web Age Aniversary Logo
US Inquiries / 1.877.517.6540
Canadian Inquiries / 1.877.812.8887
Course #:WA2905

Data Engineering with Python Training

Data engineering is a software engineering practice with focus on design, development, and the productionizing of data processing systems.  It includes all the practical aspects of data acquisition, transfer, transformation, and storage on-prem or in the cloud.
This intensive hands-on training course teaches the students how to apply Python to the practical aspects of data engineering and introduces the students to the popular Python libraries used in the field, including NumPy, pandas, Matplotlib, scikit-learn, and Apache Spark.

Topics

  • Data engineering practice
  • High-octane introduction to Python
  • Technical reviews of NumPy, pandas, and other Python libraries and data processing systems
  • Data visualization and exploratory data analysis
  • Data repairing and normalization
  • Understanding the data needs and requirements of Machine Learning and Data Science projects
  • Python in the Cloud
  • Python on Hadoop (PySpark)

Audience

Developers, Software Engineers, Data Scientists, and IT Architects

Prerequisites

Participants are expected to have practical experience coding in one or more modern programming languages.  Knowledge of Python is desirable but not necessary.  The students are expected to be able to quickly learn the new material, reinforce the knowledge of a learned topic by doing programming exercises (labs), and then apply their knowledge in data engineering mini projects.

Duration

Three days

Outline of Data Engineering with Python Training

Chapter 1. Data Engineering Defined  

  • Data is King
  • Translating Data into Business Insights
  • What is Data Engineering
  • The Data-Related Roles
  • The Data Science Skill Sets
  • The Data Engineer Role
  • An Example of a Data Product
  • Data Schema for Data Exchange Interoperability
  • The Data Exchange Interoperability Options
  • Big Data and NoSQL
  • Data Physics
  • The Traditional Client - Server Processing Pattern
  • Data Locality (Distributed Computing Economics)
  • The CAP Theorem
  • Mechanisms to Guarantee a Single CAP Property
  • The CAP Triangle
  • Eventual Consistency

Chapter 2. Data Processing Phases

  • Typical Data Processing Pipeline
  • Data Discovery Phase
  • Data Harvesting Phase
  • Data Priming Phase
  • Data Logistics and Data Governance
  • Exploratory Data Analysis
  • Model Planning Phase
  • Model Building Phase
  • Communicating the Results
  • Production Roll-out

Chapter 3. Introduction to Python Programming

  • Imperative and Functional programming
  • Python core functionality
  • Integrated development environments
  • Jupyter notebooks

Chapter 4. SciPy

  • SciPy ecosystem overview
  • Data engineering use cases

Chapter 5. NumPy

  • Introduction to NumPy
  • NumPy's value proposition
  • N-dimensional arrays
  • Broadcasting
  • Linear algebra capabilities
  • Data indexing, slicing, and iterating

Chapter 6. Pandas

  • Introduction to pandas
  • Pandas' data structures
  • Wrangling tabular data with pandas and NumPy
  • Merging, joining, and aggregating data
  • Dealing with categorical data
  • Time series
  • Visualization capabilities

Chapter 7. Matplotlib

  • Exploratory data analysis
  • Data visualization with matplotlib

Chapter 8. Core Data Engineering Tasks

  • Data acquisition in Python
  • Database and Web interfaces
  • Ensuring data quality
  • Repairing and normalizing data
  • Descriptive statistics computing features in Python
  • Processing data at scale

Chapter 9. Python in the Cloud

  • AWS Lambdas
  • AWS Glue
  • AWS EMR

Chapter 10. PySpark

  • Scalable Computing Needs
  • Introduction to Apache Spark
  • Running PySpark on Hadoop
  • Spark SQL
  • The DataFrame Structure
We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC.
US Inquiries / 1.877.517.6540
Canadian Inquiries / 1.877.812.8887