Outline for Implementing Data Engineering on Azure Training
Chapter 1. Understanding Data Engineering
Overview Of Traditional Database Engineer And Bi Developer Roles
When Relational Databases And Sql Isn’t Enough
Transitioning From The Traditional Database Engineer Role To The Data Engineer Role
The Modern Data Sources (Relational, Non-Relational, Real-Time & Streaming)
Chapter 2. Azure Data Lake Overview
What Is Azure Data Lake
Importance Of Data Lake
Gen1 Vs. Gen2.
Hierarchical Namespace In Gen2
Chapter 3. Using U-Sql
What Is The U-Sql Language
Write, Run, And Manage Analytics Jobs
Extend U-Sql Using Python
Chapter 4. Monitoring And Optimizing U-Sql Jobs
Schedule U-Sql Jobs
Manage U-Sql Jobs
Troubleshoot U-Sql Jobs
Performance Optimization
Chapter 5. Cosmos DB
MOC 20777 content (there are 6 modules in the course)
Chapter 6. Introduction To Big Data Formats
Why Different Formats Emerged
The Evolution Of Data Formats
Use Cases For Different Formats
Understanding Avro
Understanding Parquet
Understanding Optimized Row Columnar (ORC)
Challenges Involved In Converting Formats
Chapter 7. Azure Data Factory Overview
What Is Azure Data Factory
Understanding Automated Data Pipelines
Understanding Data Sets
Understanding Activities
Chapter 8. Developing Azure Data Factory Pipelines
Understanding Options For Developing Data Factory Pipelines
Setup Source
Setup Sink
Setup Mappings
Validate, Publish, And Test
Developing Data Factory Pipelines Using Python
Chapter 9. Managing Azure Data Factory Jobs
Scheduling Data Factory Jobs
Executing Data Factory Jobs
Monitoring Data Factory Jobs
Understanding Tumbling Windows
Understanding Concurrency
Understanding Dependency
Understanding Troubleshooting
Chapter 10. Getting Started on Microsoft HDInsight
Introduction to Hadoop
Working with MapReduce Function
Introduction to HDInsight
Understanding HDInsight Cluster Types
Deploying HDInsight Clusters
Chapter 11. Applying Data Engineering on Microsoft HDInsight
Understanding various Data Loading Tools
Loading Data into HDInsight
Understanding Apache Hive Solutions
HDInsight Data Queries using Hive and Pig
Chapter 12. Putting It Together
Combining Azure Data Factory With Azure HDinsight
Combining Azure Data Factory With Azure Machine Learning
Transforming And Processing Raw Data Into Predictions And Insights
Chapter 13. Implementing Streaming Solutions with Kafka and HBase
Introduction to Kafka and HBase
Deploying a Kafka Cluster
Publishing, Consuming, and Processing Data
Storing Data to HBase
Querying Data in HBase
Chapter 14. Introduction to Streaming Data using Apache Spark
Exploring Sources and Sinks
Understanding Streaming Data Frames
Understanding Window Operations on Frames
Introduction to Streaming Joins
Monitoring Streaming Queries
Chapter 15. Implementing Streaming Solutions with Databricks
Introduction to Structured Streaming on Azure Databricks
Setting up Azure Databricks
Configuring Source and Sink
Building Streaming Pipeline on Azure Databricks
Working with Timestamps and Windows
Understanding Stateful Operations
Handling Multiple Streams and Datasets
Optimizing Streaming Pipeline for Production Use
Chapter 16. Implementing Real-time Processing Solutions with Apache Storm
How to Persist Long-term Data
Streaming Data with Apache Storm
Understanding Apache Storm Topologies
Creating Apache Storm Topologies
Configuring Apache Storm