Cloud Data Engineering with NiFi on AWS or GCP Training
This course is intended for students looking to learn data processing on the cloud with Apache NiFi - a visually programmed software tool that automates the movement and transformation of data between systems. Course material will cover data engineering theory and practical development advice.
Duration
Prerequisites
- Some familiarity with common data engineering practices
- Familiarity with the basics of modern data management: JSON, XML, SQL
- Programming is not required but Java or Python skills will enable powerusers to leverage advanced functionality
- AWS or GCP familiarity is an advantage but not mandatory
Target Audience
- Data Engineers
- Engineering Managers
- Architects
- Data Scientists
- Technical professionals with data-oriented responsibilities (i.e. scientists needing more near real-time data, data engineers looking to build a new data pipeline)
Skills Gained
- Understand the basics of Apache NiFi dataflow development, common patterns (feature extraction, database connectivity, etc), debugging and best practices.
Day 1: Intro to Apache NiFi - 3 hrs
Goal: Understand the fundamentals of Apache NiFi.
Content: Introduce the core concepts of Apache NiFi (processor, connection, flowfile, expression language)
Day 1: Lab #1: Create a data pipeline - 3 hrs
Goal: Apply fundamental concepts of Apache NiFi to real world examples tailored to your specific domain. Examples: working with IoT data, parsing financial reports, or managing a streaming feed of social media data.
Content: Expression language deep dive, Basic functionality of common NiFi processors
Lab: Practical lab tailing log files of Apache NiFi node to monitor for specific actions. Students will analyze real system logs and filter dataflows based on content.
Day 2: Operating Apache NiFi in the Cloud - 3 hrs
Goal: How to operate Apache NiFi in the cloud
Content: Version control, processor groups, controller services, security, ecosystem integrations and record based processing.
Lab: Short outline of processor groups and version-controlled flow templates
Day 2: Real World Examples & Lab #2: Customizing dataflows - 3 hrs
Goal: Build an understanding of their student domains’ real world problems.
Content: Instructor lead discussion of the types of problems students mayface in their day to day jobs. (All instructors are experienced in consulting and professional services in addition to teaching.)
Lab: Expansion of Day 1 lab with more complex features and version control.
Day 3: Data Structures and Usage in Apache NiFI & Lab #3 - 3 hrs
Goal: Data structures and usage in Apache NiFi
Content: How Apache NiFi is an event processing framework vs a batch framework, incorporating content from Day 1 & 2 and different approaches to common use cases.
Lab: Lab with domain-specific data to learn how to transform data from its default state into more consumable usage patterns. Students will be tasked with creating key-value pairs, bulk JSONs, XMLs and even perform calculations. Schemas will need to be built & validated leading to open ended discovery. This will enable students to practice patterns used to make data more useful.
(For example, a finance domain data lab will use real-world financial data feeds and challenge students to store data in multiple different formats to support a variety of use-cases.)
Day 3: Practice and Customization of Data Pipelines, Lab #3 - 3 hrs
Goal: Wrap up lab and review how each student solved open ended problems. Volunteers will be able to present how they solved problems and the advantages and disadvantages of different approaches will be discussed.
Lab: Continuation of lab exercise #3 - domain specific data lab