Course #:WA2950 Data Engineering with PySpark Training (Coming Soon) Data Engineering with PySpark Audience Data Warehouse and Data Lake Specialists, Software Developers Prerequisites General background in programming and/or data processing; ability to learn a new language (Python) by doing stepwise exercises Duration Three days Outline of Data Engineering with PySpark Training Chapter 1. Defining Data Engineering What is Data Engineering? How is it different from Data Science? Chapter 2. The Data Engineer Role The scope of the DE role Data Scientists, Machine Learning Specialists, and Data Engineers Chapter 3. Data Processing Phases Data Ingestion Data Cleansing Chapter 4. Distributed Computing Concepts Data Physics CAP Theorem Hadoop Chapter 5. Apache Spark Supported Languages Distributed Data Processing with PySpark Chapter 6. Apache Spark Dev Environments Spark Shells Jupyter Notebooks Chapter 7. Introduction to Functional Programming Why I need Functional Programming? Functional Programming with Python Chapter 8. Functional Programming using Spark RDD API RDD Transformations and Actions Data Partitioning Chapter 9. ETL Jobs with RDD Using map-reduce FP for Data Processing Chapter 10. Spark SQL DataFrames What are DataFrames? Relationship with RDDs Ways to Create DataFrames Schema of Datasets Inferring the Schema Chapter 11. SQL-centric Programming using DataFrames API Using the sql Method, and the Native DataFrame API Data Aggregation Chapter 12. ETL Jobs with DataFrames Using Spark SQL DataFrame API Contrasting with Spark RDD API Chapter 13. Repairing and Normalizing Data What May Be Wrong With My Data? Detecting and Removing Bad Data Chapter 14. Data Visualization with seaborn EDA Available Options Chapter 15. Working with Various File Formats: CSV, Parquet, ORC, and JSON0 What is Columnar Data Storage Formats? Comparing Various Formats Ways to Read and Store Data in Various Formats We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC. View Course Outline Share This Request On-Site or Customized Course Info REGISTER FOR A COURSEWARE SAMPLE x Sent First Name Last Name Email Request On-Site or Customized Course Info x Sent First Name Last Name Phone Number Company Name Email Question
Course #:WA2950 Data Engineering with PySpark Training (Coming Soon) Data Engineering with PySpark Audience Data Warehouse and Data Lake Specialists, Software Developers Prerequisites General background in programming and/or data processing; ability to learn a new language (Python) by doing stepwise exercises Duration Three days Outline of Data Engineering with PySpark Training Chapter 1. Defining Data Engineering What is Data Engineering? How is it different from Data Science? Chapter 2. The Data Engineer Role The scope of the DE role Data Scientists, Machine Learning Specialists, and Data Engineers Chapter 3. Data Processing Phases Data Ingestion Data Cleansing Chapter 4. Distributed Computing Concepts Data Physics CAP Theorem Hadoop Chapter 5. Apache Spark Supported Languages Distributed Data Processing with PySpark Chapter 6. Apache Spark Dev Environments Spark Shells Jupyter Notebooks Chapter 7. Introduction to Functional Programming Why I need Functional Programming? Functional Programming with Python Chapter 8. Functional Programming using Spark RDD API RDD Transformations and Actions Data Partitioning Chapter 9. ETL Jobs with RDD Using map-reduce FP for Data Processing Chapter 10. Spark SQL DataFrames What are DataFrames? Relationship with RDDs Ways to Create DataFrames Schema of Datasets Inferring the Schema Chapter 11. SQL-centric Programming using DataFrames API Using the sql Method, and the Native DataFrame API Data Aggregation Chapter 12. ETL Jobs with DataFrames Using Spark SQL DataFrame API Contrasting with Spark RDD API Chapter 13. Repairing and Normalizing Data What May Be Wrong With My Data? Detecting and Removing Bad Data Chapter 14. Data Visualization with seaborn EDA Available Options Chapter 15. Working with Various File Formats: CSV, Parquet, ORC, and JSON0 What is Columnar Data Storage Formats? Comparing Various Formats Ways to Read and Store Data in Various Formats We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC. View Course Outline Share This Request On-Site or Customized Course Info REGISTER FOR A COURSEWARE SAMPLE x Sent First Name Last Name Email Request On-Site or Customized Course Info x Sent First Name Last Name Phone Number Company Name Email Question