Course #:WA2914 Introduction to Python and PySpark Training Download Sample Labs 05/17/2021 - 05/19/2021 USD$1,995.00 Instructor Led Virtual 05/25/2021 - 05/27/2021 USD$1,995.00 Instructor Led Virtual 06/28/2021 - 06/30/2021 USD$1,995.00 Instructor Led Virtual 07/06/2021 - 07/08/2021 USD$1,995.00 Instructor Led Virtual 07/19/2021 - 07/21/2021 USD$1,995.00 Instructor Led Virtual 08/03/2021 - 08/05/2021 USD$1,995.00 Instructor Led Virtual Courseware: Available for sale This three-day course is designed to provide Developers and/or Data Analysts a gentle immersive hands-on introduction to the Python programming language and Apache PySpark. Audience: Developers and/or Data Analysts Prerequisites Programming and/or scripting experience in another language other than python Duration Three days Outline of Introduction to Python and PySpark Training Chapter 1. Introduction to Python What is Python Uses of Python Installing Python Python Package Manager (PIP) Using the Python Shell Python Code Conventions Importing Modules The Help(object) Command The Help Prompt Summary Chapter 2. Python Scripts Executing Python Code Python Scripts Writing Scripts Running Python Scripts Self Executing Scripts Accepting Command-Line Parameters Accepting Interactive Input Retrieving Environment Settings Summary Chapter 3. Data Types and Variables Creating Variables Displaying Variables Basic Concatenation Data Types Strings Strings as Arrays String Methods Combining Strings and Numbers Numeric Types Integer Types Floating Point Types Boolean Types Checking Data Type Summary Chapter 4. Python Collections Python Collections List Type Modifying Lists Sorting a List Tuple Type Python Sets Modifying Sets Dictionary (Map) Type Dictionary Methods Sequences Summary Chapter 5. Control Statements and Looping If Statement elif Keyword Boolean Conditions Single Line If Statements For-in Loops Looping over an Index Range Function Nested Loops While Loops Exception Handling Built-in Exceptions Exceptions thrown by Built-In Functions Summary Chapter 6. Functions in Python Defining Functions Naming Functions Using Functions Function Parameters Named Parameters Variable Length Parameter List How Parameters are Passed Variable Scope Returning Values Docstrings Best Practices Single Responsibility Returning a Value Function Length Pure and Idempotent Functions Summary Chapter 7. Working With Data in Python Data Type Conversions Conversions from other Types to Integer Conversions from other Types to Float Conversions from other Types to String Conversions from other Types to Boolean Converting Between Set, List and Tuple Data Structures Modifying Tuples Combining Set, List and Tuple Data Structures Creating Dictionaries from other Data Structures Summary Chapter 8. Reading and Writing Text Files Opening a File Writing a File Reading a File Appending to a File File Operations Using the With Statement File and Directory Operations Reading JSON Writing JSON Summary Chapter 9. Functional Programming Primer What is Functional Programming? Benefits of Functional Programming Functions as Data Using Map Function Using Filter Function Lambda expressions List.sort() Using Lambda Expression Difference Between Simple Loops and map/filter Type Functions Additional Functions General Rules for Creating Functions Summary Chapter 10. Introduction to Apache Spark What is Apache Spark A Short History of Spark Where to Get Spark? The Spark Platform Spark Logo Common Spark Use Cases Languages Supported by Spark Running Spark on a Cluster The Driver Process Spark Applications Spark Shell The spark-submit Tool The spark-submit Tool Configuration The Executor and Worker Processes The Spark Application Architecture Interfaces with Data Storage Systems Limitations of Hadoop's MapReduce Spark vs MapReduce Spark as an Alternative to Apache Tez The Resilient Distributed Dataset (RDD) Datasets and DataFrames Spark Streaming (Micro-batching) Spark SQL Example of Spark SQL Spark Machine Learning Library GraphX Spark vs R Summary Chapter 11. The Spark Shell The Spark Shell The Spark v.2 + Command-Line Shells The Spark Shell UI Spark Shell Options Getting Help Jupyter Notebook Shell Environment Example of a Jupyter Notebook Web UI (Databricks Cloud) The Spark Context (sc) and Spark Session (spark) Creating a Spark Session Object in Spark Applications The Shell Spark Context Object (sc) The Shell Spark Session Object (spark) Loading Files Saving Files Summary Chapter 12. Spark RDDs The Resilient Distributed Dataset (RDD) Ways to Create an RDD Supported Data Types RDD Operations RDDs are Immutable Spark Actions RDD Transformations Other RDD Operations Chaining RDD Operations RDD Lineage The Big Picture What May Go Wrong Checkpointing RDDs Local Checkpointing Parallelized Collections More on parallelize() Method The Pair RDD Where do I use Pair RDDs? Example of Creating a Pair RDD with Map Example of Creating a Pair RDD with keyBy Miscellaneous Pair RDD Operations RDD Caching RDD Persistence Summary Chapter 13. Parallel Data Processing with Spark Running Spark on a Cluster Data Partitioning Data Partitioning Diagram Single Local File System RDD Partitioning Multiple File RDD Partitioning Special Cases for Small-sized Files Parallel Data Processing of Partitions Spark Application, Jobs, and Tasks Stages and Shuffles The "Big Picture" Summary Chapter 14. Shared Variables in Spark Shared Variables in Spark Broadcast Variables Creating and Using Broadcast Variables Example of Using Broadcast Variables Problems with Global Variables Example of the Closure Problem Accumulators Creating and Using Accumulators Example of Using Accumulators (Scala Example) Example of Using Accumulators (Python Example) Custom Accumulators Summary Chapter 15. Introduction to Spark SQL What is Spark SQL? Uniform Data Access with Spark SQL Hive Integration Hive Interface Integration with BI Tools What is a DataFrame? Creating a DataFrame in PySpark Commonly Used DataFrame Methods and Properties in PySpark Grouping and Aggregation in PySpark The "DataFrame to RDD" Bridge in PySpark The SQLContext Object Examples of Spark SQL / DataFrame (PySpark Example) Converting an RDD to a DataFrame Example Example of Reading / Writing a JSON File Using JDBC Sources JDBC Connection Example Performance, Scalability, and Fault-tolerance of Spark SQL Summary Chapter 16. Repairing and Normalizing Data Repairing and Normalizing Data Dealing with the Missing Data Sample Data Set Getting Info on Null Data Dropping a Column Interpolating Missing Data in pandas Replacing the Missing Values with the Mean Value Scaling (Normalizing) the Data Data Preprocessing with scikit-learn Scaling with the scale() Function The MinMaxScaler Object Summary Chapter 17. Data Grouping and Aggregation in Python Data Aggregation and Grouping Sample Data Set The pandas.core.groupby.SeriesGroupBy Object Grouping by Two or More Columns Emulating SQL's WHERE Clause The Pivot Tables Cross-Tabulation Summary Lab Exercises Lab 1. Introduction to PythonLab 2. Creating ScriptsLab 3. Variables in PythonLab 4. CollectionsLab 5. Control Statements and LoopsLab 6. Functions in PythonLab 7. Reading and Writing Text FilesLab 8. Functional Programming Lab 9. The PySpark Shell Lab 10. Data Transformation with PySparkLab 11. RDD Performance Improvement Techniques with PySparkLab 12. Spark SQL with PySparkLab 13. Repairing and Normalizing DataLab 14. Data Grouping and Aggregation We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC. View Course Outline Share This Request On-Site or Customized Course Info Lab Setup Guide REGISTER FOR A COURSEWARE SAMPLE x Sent First Name Last Name Email Request On-Site or Customized Course Info x Sent First Name Last Name Phone Number Company Name Email Question
Course #:WA2914 Introduction to Python and PySpark Training Download Sample Labs 05/17/2021 - 05/19/2021 USD$1,995.00 Instructor Led Virtual 05/25/2021 - 05/27/2021 USD$1,995.00 Instructor Led Virtual 06/28/2021 - 06/30/2021 USD$1,995.00 Instructor Led Virtual 07/06/2021 - 07/08/2021 USD$1,995.00 Instructor Led Virtual 07/19/2021 - 07/21/2021 USD$1,995.00 Instructor Led Virtual 08/03/2021 - 08/05/2021 USD$1,995.00 Instructor Led Virtual Courseware: Available for sale This three-day course is designed to provide Developers and/or Data Analysts a gentle immersive hands-on introduction to the Python programming language and Apache PySpark. Audience: Developers and/or Data Analysts Prerequisites Programming and/or scripting experience in another language other than python Duration Three days Outline of Introduction to Python and PySpark Training Chapter 1. Introduction to Python What is Python Uses of Python Installing Python Python Package Manager (PIP) Using the Python Shell Python Code Conventions Importing Modules The Help(object) Command The Help Prompt Summary Chapter 2. Python Scripts Executing Python Code Python Scripts Writing Scripts Running Python Scripts Self Executing Scripts Accepting Command-Line Parameters Accepting Interactive Input Retrieving Environment Settings Summary Chapter 3. Data Types and Variables Creating Variables Displaying Variables Basic Concatenation Data Types Strings Strings as Arrays String Methods Combining Strings and Numbers Numeric Types Integer Types Floating Point Types Boolean Types Checking Data Type Summary Chapter 4. Python Collections Python Collections List Type Modifying Lists Sorting a List Tuple Type Python Sets Modifying Sets Dictionary (Map) Type Dictionary Methods Sequences Summary Chapter 5. Control Statements and Looping If Statement elif Keyword Boolean Conditions Single Line If Statements For-in Loops Looping over an Index Range Function Nested Loops While Loops Exception Handling Built-in Exceptions Exceptions thrown by Built-In Functions Summary Chapter 6. Functions in Python Defining Functions Naming Functions Using Functions Function Parameters Named Parameters Variable Length Parameter List How Parameters are Passed Variable Scope Returning Values Docstrings Best Practices Single Responsibility Returning a Value Function Length Pure and Idempotent Functions Summary Chapter 7. Working With Data in Python Data Type Conversions Conversions from other Types to Integer Conversions from other Types to Float Conversions from other Types to String Conversions from other Types to Boolean Converting Between Set, List and Tuple Data Structures Modifying Tuples Combining Set, List and Tuple Data Structures Creating Dictionaries from other Data Structures Summary Chapter 8. Reading and Writing Text Files Opening a File Writing a File Reading a File Appending to a File File Operations Using the With Statement File and Directory Operations Reading JSON Writing JSON Summary Chapter 9. Functional Programming Primer What is Functional Programming? Benefits of Functional Programming Functions as Data Using Map Function Using Filter Function Lambda expressions List.sort() Using Lambda Expression Difference Between Simple Loops and map/filter Type Functions Additional Functions General Rules for Creating Functions Summary Chapter 10. Introduction to Apache Spark What is Apache Spark A Short History of Spark Where to Get Spark? The Spark Platform Spark Logo Common Spark Use Cases Languages Supported by Spark Running Spark on a Cluster The Driver Process Spark Applications Spark Shell The spark-submit Tool The spark-submit Tool Configuration The Executor and Worker Processes The Spark Application Architecture Interfaces with Data Storage Systems Limitations of Hadoop's MapReduce Spark vs MapReduce Spark as an Alternative to Apache Tez The Resilient Distributed Dataset (RDD) Datasets and DataFrames Spark Streaming (Micro-batching) Spark SQL Example of Spark SQL Spark Machine Learning Library GraphX Spark vs R Summary Chapter 11. The Spark Shell The Spark Shell The Spark v.2 + Command-Line Shells The Spark Shell UI Spark Shell Options Getting Help Jupyter Notebook Shell Environment Example of a Jupyter Notebook Web UI (Databricks Cloud) The Spark Context (sc) and Spark Session (spark) Creating a Spark Session Object in Spark Applications The Shell Spark Context Object (sc) The Shell Spark Session Object (spark) Loading Files Saving Files Summary Chapter 12. Spark RDDs The Resilient Distributed Dataset (RDD) Ways to Create an RDD Supported Data Types RDD Operations RDDs are Immutable Spark Actions RDD Transformations Other RDD Operations Chaining RDD Operations RDD Lineage The Big Picture What May Go Wrong Checkpointing RDDs Local Checkpointing Parallelized Collections More on parallelize() Method The Pair RDD Where do I use Pair RDDs? Example of Creating a Pair RDD with Map Example of Creating a Pair RDD with keyBy Miscellaneous Pair RDD Operations RDD Caching RDD Persistence Summary Chapter 13. Parallel Data Processing with Spark Running Spark on a Cluster Data Partitioning Data Partitioning Diagram Single Local File System RDD Partitioning Multiple File RDD Partitioning Special Cases for Small-sized Files Parallel Data Processing of Partitions Spark Application, Jobs, and Tasks Stages and Shuffles The "Big Picture" Summary Chapter 14. Shared Variables in Spark Shared Variables in Spark Broadcast Variables Creating and Using Broadcast Variables Example of Using Broadcast Variables Problems with Global Variables Example of the Closure Problem Accumulators Creating and Using Accumulators Example of Using Accumulators (Scala Example) Example of Using Accumulators (Python Example) Custom Accumulators Summary Chapter 15. Introduction to Spark SQL What is Spark SQL? Uniform Data Access with Spark SQL Hive Integration Hive Interface Integration with BI Tools What is a DataFrame? Creating a DataFrame in PySpark Commonly Used DataFrame Methods and Properties in PySpark Grouping and Aggregation in PySpark The "DataFrame to RDD" Bridge in PySpark The SQLContext Object Examples of Spark SQL / DataFrame (PySpark Example) Converting an RDD to a DataFrame Example Example of Reading / Writing a JSON File Using JDBC Sources JDBC Connection Example Performance, Scalability, and Fault-tolerance of Spark SQL Summary Chapter 16. Repairing and Normalizing Data Repairing and Normalizing Data Dealing with the Missing Data Sample Data Set Getting Info on Null Data Dropping a Column Interpolating Missing Data in pandas Replacing the Missing Values with the Mean Value Scaling (Normalizing) the Data Data Preprocessing with scikit-learn Scaling with the scale() Function The MinMaxScaler Object Summary Chapter 17. Data Grouping and Aggregation in Python Data Aggregation and Grouping Sample Data Set The pandas.core.groupby.SeriesGroupBy Object Grouping by Two or More Columns Emulating SQL's WHERE Clause The Pivot Tables Cross-Tabulation Summary Lab Exercises Lab 1. Introduction to PythonLab 2. Creating ScriptsLab 3. Variables in PythonLab 4. CollectionsLab 5. Control Statements and LoopsLab 6. Functions in PythonLab 7. Reading and Writing Text FilesLab 8. Functional Programming Lab 9. The PySpark Shell Lab 10. Data Transformation with PySparkLab 11. RDD Performance Improvement Techniques with PySparkLab 12. Spark SQL with PySparkLab 13. Repairing and Normalizing DataLab 14. Data Grouping and Aggregation We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC. View Course Outline Share This Request On-Site or Customized Course Info Lab Setup Guide REGISTER FOR A COURSEWARE SAMPLE x Sent First Name Last Name Email Request On-Site or Customized Course Info x Sent First Name Last Name Phone Number Company Name Email Question