Objectives

One of the main objectives of DataOps is data quality, which affects the overall quality of analytical work carried out in support of tactical and strategic decision-making in organizations.

The DataOps for IT Professionals course primarily focuses on ways to help organizations ensure higher levels of quality of their data as well as some technological aspects of this process.   

Audience

Data Engineers, Developers, Architects, and Technical Managers 

Prerequisites

Practical work experience in data processing environments

Duration 

One Day

Outline for DataOps for IT Professionals Training

Chapter 1. DataOps Introduction

  • DataOps Enterprise Data Technologies
  • Enterprise Data Processing Challenges and IT Systems' Woes:
    • Data Quality
    • What Makes Information Systems Cluttered and Myopic
    • Fragmented Data Sources
    • Different Data Formats
    • System Interoperability 
    • Maintenance Issues
  • Data-Related Roles
  • Data Engineering
  • What is DataOps?
  • The DataOps Technology and Methodology Stack
  • The DataOps Manifesto
  • Agile Development
  • DevOps
  • The Lean Manufacturing Methodology
  • Key Components of a DataOps Platform
  • Overview of DataOps Tools and Services
  • Overview of DataOps Platforms

Chapter 2. Data Quality

  • Data Quality Definitions
  • Dimensions of Data Quality
  • Defining "Bad" Data
    • Missing Data
    • Wrong/Incorrect Data or Data Format
    • Inconsistent Data
    • Outdated (Stale) Information
    • Unverifiable Data
    • Withheld Data
  • Common Causes for “Bad" Data
    • Human Factor
    • Infrastructure- and Network-Related Issues
    • Software Defects
    • Using the Wrong Tool for the Job
    • Using Untrusted Data
    • Aggregation of Data from Disparate Data Sources that have Impedance Mismatch
    • Wrong QoS Settings of Queueing Systems
    • Wrong Caching System Settings, e.g. TTL
    • Not Using the "Ground Truth" Data
    • Differently Configured Development/UAT/Production Systems
      • How to Eliminate Environment Disparity
    • Confusing Big-Endian and Little-Endian Byte Order
  • Ensuring Data Quality
    • Ensuring Integrity of Datasets 
      • Dataset Checksums:
        • CRC (cyclic redundancy check) as automatic error-detection mechanism
        • MD5 and SHA-* Hashes 
      • The Dataset Shapes for Basic Integrity Checks
  • Dealing with "Bad" Input Data
    • DDL-enforced Schema & Schema-on-Demand (-on-Read)
    • SQL Constraints as Rules for Column-Level and Table-Wide Data
    • XML Schema Definition (XSD) for XML Documents 
    • Validating JSON Documents
    • Regular Expressions
    • Data Cleansing of Data at Rest
    • Controlling Integrity of Data-in-Transit
    • Database Normalization
      • Normal Forms
      • When to De/normalize
    • Using Assertions in Applications 
    • Operationalizing Input Data Validation
      • Microservices
      • API Management Solutions
  • Data Consistency and Availability
    • Example of a Consistency vs Availability Gap: https://www.youtube.com/watch?v=A-brgkkjnHc  
    • The CAP Triangle: Selecting Which System to Use
  • Dealing with Duplicate Data
    • At Source
    • In Application
  • Dealing with Missing (NaN) Data
    • Example of Using NumPy and pandas Python Libraries
  • Master (Authoritative) Data Management 
    • The "Golden Record"/"Ground Truth" Concept
  • Enforcing Data Consistency with the scikit-learn LabelEncoder Class
  • Data Provenance
  • The Event Sourcing Pattern
  • Adopting the Culture of Automation
  • On-going Auditing
  • Monitoring and Alerting
  • UiPath
  •  Workflow (Pipeline) Orchestration Systems
    • DataOps Data Pipelines
    • Apache NiFi

Chapter 3. How to Lead with Data

  • Enterprise Architecture Components
    • Business Architecture
    • Information Architecture
    • Application Architecture
    • Technology Architecture
  • DataOps Functional Architecture
  • The Snowflake Data Cloud
  • Cloud Design for System Resiliency
  • New Data Architecture:
    • Data Ownership
    • Shared Environment Security Controls

Chapter 4. Data Governance (Optional)

  • The Need for Data Governance
  • Controlling the Decision-Making Process
  • Controlling "Agile IT"
  • Types of Requirements
    • Product
    • Process
  • Scoping Requirements
  • Governance Gotchas
  • Governance Best Practices
04/15/2024 - 04/15/2024
10:00 AM - 06:00 PM
Eastern Standard Time
Online Virtual Class
USD $810.00
Enroll