Objectives

One of the main objectives of DataOps is data quality, which affects the overall quality of analytical work carried out in support of tactical and strategic decision-making in organizations.

The DataOps for Data Professionals course primarily focuses on ways to help organizations ensure higher levels of quality of their data as well as some technical and statistical aspects of this process.

Audience

Business Analysts, Data Scientists, Architects, and Technical Managers

Prerequisites

Practical work experience in data processing environments

Duration

One Day

 

Outline for DataOps for Data Professionals Training

Chapter 1. DataOps Introduction

  • Enterprise Data Processing Challenges and IT Systems' Woes:
    • Data Quality
    • What Makes Information Systems Cluttered and Myopic
    • Fragmented Data Sources
    • Different Data Formats
    • System Interoperability 
    • Maintenance Issues
  •  Data-Related Roles
  •  What is DataOps?
  •  Problems that DataOps is Positioned to Solve
  •  The DataOps Technology and Methodology Stack
  •  The DataOps Manifesto
  •  Agile Development
  •  The Lean Manufacturing Methodology
  •  Statistical Process Control (SPC)
  •  Six Sigma
  •  Roles and Responsibilities in DataOps 
  •  Promoting Teamwork
  •  DataOps and Data Science Relationship
  •  Key Components of a DataOps Platform
  •  Overview of DataOps Platforms

Chapter 2. Data Quality

  • Data Quality Definitions
  • Dimensions of Data Quality
  • Data Observability
  • Defining "Bad" Data
    • Missing Data
    • Wrong/Incorrect Data or Data Format
    • Inconsistent Data
    • Outdated (Stale) Information
    • Unverifiable Data
    • Withheld Data 
  • Ensuring Data Quality
    • Dealing with "Bad" Input Data
      • DDL-enforced Schema & Schema-on-Demand (-on-Read)
      • SQL Constraints as Rules for Column-Level and Table-Wide Data
      • XML Schema Definition (XSD) for XML Documents 
      • Validating JSON Documents
      • Regular Expressions
      • Data Cleansing of Data at Rest 
      • Database Normalization
        • Normal Forms
        • When to De/normalize
  • Data Consistency and Availability
    • Example of a Consistency vs Availability Gap: https://www.youtube.com/watch?v=A-brgkkjnHc   
  • Master (Authoritative) Data Management 
    • The "Golden Record"/"Ground Truth" Concept
  •  Statistical Summary (Descriptive Statistics)  of Datasets 
  •  Sampling Data Using Descriptive Statistics
  •  Python Data Science (ML) and Visualization Libraries Overview
    • NumPy
    • pandas
    • scikit-learn
    • Matplotlib and seaborn
  • Exploratory Data Analysis (EDA)
    • Finding Outliers with Box Plots
    • Histograms and KDE
      • Univariate and Bivariate Distributions
    •   Categorical Scatter Plots
    •   Pair Plots
    •   Heatmaps
    •   Visualizing Multi-Dimensional Datasets
  •   Enforcing Data Consistency with the scikit-learn LabelEncoder Class
  •   Dealing with Multicollinearity Problem
  •   Creating Composite Features
  •   Reducing Dimensionality of Datasets with Principal Component Analysis (PCA)
  •   Using Chi-Squared Test for Selecting Best-Fit Data
    • Using the scikit-learn SelectKBest and chi2 Classes
  •    Dealing with Missing (NaN) Data
  •    Data Provenance
  •    The Event Sourcing Pattern
  •    Adopting the Culture of Automation
  •    On-going Auditing
  •    Monitoring and Alerting
  •    Workflow (Pipeline) Orchestration Systems
    • DataOps Data Pipelines

Chapter 3. How to Lead with Data

  • DataOps Functional Architecture
  • The Snowflake Data Cloud
  • Cloud Design for System Resiliency
  • New Data Architecture:
    • Data Ownership
    • Shared Environment Security Controls

Chapter 4. Data Governance (Optional)

  • The Tragedy of the (Unmanaged) Commons
  • The Need for Data Governance
  • Controlling the Decision-Making Process
  • Controlling "Agile IT"
  • Types of Requirements
    • Product
    • Process
  •    Scoping Requirements
  •    Governance Gotchas
  •    Governance Best Practices
03/11/2024 - 03/11/2024
10:00 AM - 06:00 PM
Eastern Standard Time
Online Virtual Class
USD $810.00
Enroll
04/15/2024 - 04/15/2024
10:00 AM - 06:00 PM
Eastern Standard Time
Online Virtual Class
USD $810.00
Enroll