Students will learn the basics of Apache NiFi dataflow development, common patterns (feature extraction, database connectivity, etc), debugging and best practices. Completing this course will enable students to effectively develop, tune, anddebug Apache NiFi dataflows.
Data Engineers, Engineering Managers, Architects, Data Scientists,Technical professionals with data-oriented responsibilities (i.e. scientists needing more near real-time data, data engineers looking to build a new data pipeline)
- Some familiarity with common data engineering practices
- Familiarity with the basics of modern data management: JSON, XML, SQL
- Programming is not required but Java or Python skills will enable powerusers to leverage advanced functionality
- AWS or GCP familiarity is an advantage but not mandatory
Outline for Apache NiFi in the Cloud Crash Course
Day 1: Intro to Apache NiFi - 3 hrs
Goal: Understand the fundamentals of Apache NiFi.
Content: Introduce the core concepts of Apache NiFi (processor, connection,
flowfile, expression language)
Day 1: Lab #1: Create a data pipeline - 3 hrs
Goal: Apply fundamental concepts of Apache NiFi to real world examples
tailored to your specific domain. Examples: working with IoT data, parsing
financial reports, or managing a streaming feed of social media data.
Content: Expression language deep dive, Basic functionality of common NiFi
processorsLab: Practical lab tailing log files of Apache NiFi node to monitor for
specific actions. Students will analyze real system logs and filter dataflows
based on content
Day 2 Part 1 - Data Structures and Usage in Apache NiFI - 3 hrs
Goal: Data structures and usage in Apache NiFi
Content: How Apache NiFi is an event processing framework vs a batch
framework, incorporating content from Day 1 & 2 and different approaches to
common use cases.
Day 2 Part 2 - Operating Apache NiFi in the Cloud and Lab #2 - 3 hrs
Goal: How to operate Apache NiFi in the cloudContent: Version control,
processor groups, controller services, security, ecosystem integrations and
record based processing.
Lab: Short outline of processor groups and version-controlled flow templates