To get the most of out of this course, participants should have:
- Basic proficiency with a common query language such as SQL.
- Experience with data modeling and ETL (extract, transform, load) activities.
- Experience with developing applications using a common programming language such as Python.
- Familiarity with machine learning and/or statistics.
This course teaches participants the following skills:
- Design and build data processing systems on Google Cloud.
- Process batch and streaming data by implementing autoscaling data pipelines on Dataflow.
- Derive business insights from extremely large datasets using BigQuery.
- Leverage unstructured data using Spark and ML APIs on Dataproc.
- Enable instant insights from streaming data.
- Understand ML APIs and BigQuery ML, and learn to use AutoML to create powerful models without coding.
Who Can Benefit?
This class is intended for developers who are responsible for:
- Extracting, Loading, Transforming, cleaning, and validating data
- Designing pipelines and architectures for data processing
- Integrating analytics and machine learning capabilities into data pipelines
- Querying datasets, visualizing query results and creating reports
Outline for Data Engineering on Google Cloud Training
- Introduction to Data Engineering
- Building a Data Lake
- Building a Data Warehouse
- Introduction to Building Batch Data Pipelines
- Executing Spark on Dataproc
- Serverless Data Processing with Dataflow
- Manage Data Pipelines with Cloud Data Fusion and Cloud Composer
- Introduction to Processing Streaming Data
- Serverless Messaging with Pub/Sub
- Dataflow Streaming Features
- High-Throughput BigQuery and Bigtable Streaming Features
- Advanced BigQuery Functionality and Performance
- Introduction to Analytics and AI
- Prebuilt ML Model APIs for Unstructured Data
- Big Data Analytics with Notebooks
- Production ML Pipelines
- Custom Model Building with SQL in BigQuery ML
- Custom Model Building with AutoML