DataOps for IT Professionals Training

DataOps (Data Operations) can be defined as a process-oriented methodology with related technologies that support the needs of data analytics teams throughout the entire data lifecycle - from data acquisition to data storage, to processing and consumption (converting the data into insights/information).

Request On-Site or Customized Course Info

Course Details

Duration

1 day

Prerequisites

Practical work experience in data processing environments.

Target Audience

Data Engineers
Developers
Architects
Technical Managers

Skills Gained

Ensure high level of data quality.

Course Outline

DataOps Introduction
- DataOps Enterprise Data Technologies
- Enterprise Data Processing Challenges and IT Systems' Woes:
  - Data Quality
  - What Makes Information Systems Cluttered and Myopic
  - Fragmented Data Sources
  - Different Data Formats
  - System Interoperability
  - Maintenance Issues
- Data-Related Roles
- Data Engineering
- What is DataOps?
- The DataOps Technology and Methodology Stack
- The DataOps Manifesto
- Agile Development
- DevOps
- The Lean Manufacturing Methodology
- Key Components of a DataOps Platform
- Overview of DataOps Tools and Services
- Overview of DataOps Platforms
Data Quality
- Data Quality Definitions
- Dimensions of Data Quality
- Defining "Bad" Data
  - Missing Data
  - Wrong/Incorrect Data or Data Format
  - Inconsistent Data
  - Outdated (Stale) Information
  - Unverifiable Data
  - Withheld Data
- Common Causes for “Bad" Data
  - Human Factor
  - Infrastructure- and Network-Related Issues
  - Software Defects
  - Using the Wrong Tool for the Job
  - Using Untrusted Data
  - Aggregation of Data from Disparate Data Sources that have Impedance Mismatch
  - Wrong QoS Settings of Queueing Systems
  - Wrong Caching System Settings, e.g. TTL
  - Not Using the "Ground Truth" Data
  - Differently Configured Development/UAT/Production Systems
    - How to Eliminate Environment Disparity
  - Confusing Big-Endian and Little-Endian Byte Order
- Ensuring Data Quality
  - Ensuring Integrity of Datasets
    - Dataset Checksums:
      - CRC (cyclic redundancy check) as automatic error-detection mechanism
      - MD5 and SHA-* Hashes
    - The Dataset Shapes for Basic Integrity Checks
- Dealing with \"Bad\" Input Data
  - DDL-enforced Schema & Schema-on-Demand (-on-Read)
  - SQL Constraints as Rules for Column-Level and Table-Wide Data
  - XML Schema Definition (XSD) for XML Documents
  - Validating JSON Documents
  - Regular Expressions
  - Data Cleansing of Data at Rest
  - Controlling Integrity of Data-in-Transit
  - Database Normalization
    - Normal Forms
    - When to De/normalize
  - Using Assertions in Applications
  - Operationalizing Input Data Validation
    - Microservices
    - API Management Solutions
- Data Consistency and Availability
  - Example of a Consistency vs Availability Gap: https://www.youtube.com/watch?v=A-brgkkjnHc
  - The CAP Triangle: Selecting Which System to Use
- Dealing with Duplicate Data
  - At Source
  - In Application
- Dealing with Missing (NaN) Data
  - Example of Using NumPy and pandas Python Libraries
- Master (Authoritative) Data Management
  - The "Golden Record"/"Ground Truth" Concept
- Enforcing Data Consistency with the scikit-learn LabelEncoder Class
- Data Provenance
- The Event Sourcing Pattern
- Adopting the Culture of Automation
- On-going Auditing
- Monitoring and Alerting
- UiPath
- Workflow (Pipeline) Orchestration Systems
  - DataOps Data Pipelines
  - Apache NiFi
How to Lead with Data
- Enterprise Architecture Components
  - Business Architecture
  - Information Architecture
  - Application Architecture
  - Technology Architecture
- DataOps Functional Architecture
- The Snowflake Data Cloud
- Cloud Design for System Resiliency
- New Data Architecture:
  - Data Ownership
  - Shared Environment Security Controls
Data Governance [OPTIONAL]
- The Need for Data Governance
- Controlling the Decision-Making Process
- Controlling "Agile IT"
- Types of Requirements
  - Product
  - Process
- Scoping Requirements
- Governance Gotchas
- Governance Best Practices

DataOps for IT Professionals Training

Duration

Prerequisites

Target Audience

Skills Gained

Course Catalog

Upskilling and Reskilling

Resources

About Us

Contact