Duration

Three days

Outline for Machine Learning with Python Training

Chapter 1. Python for Data Science

  • In-Class Discussion
  • Python Data Science-Centric Libraries
  • NumPy
  • NumPy Arrays
  • Select NumPy Operations
  • SciPy
  • pandas
  • Creating a pandas DataFrame
  • Fetching and Sorting Data
  • Scikit-learn
  • Matplotlib
  • Seaborn
  • Python Dev Tools and REPLs
  • IPython
  • Jupyter
  • Jupyter Operation Modes
  • Jupyter Common Commands
  • Anaconda
  • Summary

Chapter 2. Defining Data Science

  • What is Data Science?
  • Data Science, Machine Learning, AI?
  • The Data-Related Roles
  • The Data Science Ecosystem
  • Tools of the Trade
  • Who is a Data Scientist?
  • Data Scientists at Work
  • Examples of Data Science Projects
  • An Example of a Data Product
  • Applied Data Science at Google
  • Data Science Gotchas
  • Summary

Chapter 3. Data Processing Phases

  • Typical Data Processing Pipeline
  • Data Discovery Phase
  • Data Harvesting Phase
  • Data Priming Phase
  • Exploratory Data Analysis
  • Model Planning Phase
  • Model Building Phase
  • Communicating the Results
  • Production Roll-out
  • Data Logistics and Data Governance
  • Data Processing Workflow Engines
  • Apache Airflow
  • Data Lineage and Provenance
  • Apache NiFi
  • Summary

Chapter 4. Descriptive Statistics Computing Features in Python

  • Descriptive Statistics
  • Non-uniformity of a Probability Distribution
  • Using NumPy for Calculating Descriptive Statistics Measures
  • Finding Min and Max in NumPy
  • Using pandas for Calculating Descriptive Statistics Measures
  • Correlation
  • Regression and Correlation
  • Covariance
  • Getting Pairwise Correlation and Covariance Measures
  • Finding Min and Max in pandas DataFrame
  • Summary

Chapter 5. Repairing and Normalizing Data

  • Repairing and Normalizing Data
  • Dealing with the Missing Data
  • Sample Data Set
  • Getting Info on Null Data
  • Dropping a Column
  • Interpolating Missing Data in pandas
  • Replacing the Missing Values with the Mean Value
  • Scaling (Normalizing) the Data
  • Data Preprocessing with scikit-learn
  • Scaling with the scale() Function
  • The MinMaxScaler Object
  • Summary

Chapter 6. Data Visualization in Python

  • Data Visualization
  • Data Visualization in Python
  • Matplotlib
  • Getting Started with matplotlib
  • The matplotlib.pyplot.plot() Function
  • The matplotlib.pyplot.bar() Function
  • The matplotlib.pyplot.pie () Function
  • Subplots
  • Using the matplotlib.gridspec.GridSpec Object
  • The matplotlib.pyplot.subplot() Function
  • Figures
  • Saving Figures to a File
  • Seaborn
  • Getting Started with seaborn
  • Histograms and KDE
  • Plotting Bivariate Distributions
  • Scatter plots in seaborn
  • Pair plots in seaborn
  • Heatmaps
  • ggplot
  • Summary

Chapter 7. Data Science and ML Algorithms in scikit-learn

  • In-Class Discussion
  • Types of Machine Learning
  • Terminology: Features and Observations
  • Representing Observations
  • Terminology: Labels
  • Terminology: Continuous and Categorical Features
  • Continuous Features
  • Categorical Features
  • Common Distance Metrics
  • The Euclidean Distance
  • What is a Model
  • Supervised vs Unsupervised Machine Learning
  • Supervised Machine Learning Algorithms
  • Unsupervised Machine Learning Algorithms
  • Choosing the Right Algorithm
  • The scikit-learn Package
  • scikit-learn Estimators, Models, and Predictors
  • Model Evaluation
  • The Error Rate
  • Confusion Matrix
  • The Binary Classification Confusion Matrix
  • Multi-class Classification Confusion Matrix Example
  • ROC Curve
  • Example of an ROC Curve
  • The AUC Metric
  • Feature Engineering
  • Scaling of the Features
  • Feature Blending (Creating Synthetic Features)
  • The 'One-Hot' Encoding Scheme
  • Example of 'One-Hot' Encoding Scheme
  • Bias-Variance (Underfitting vs Overfitting) Trade-off
  • The Modeling Error Factors
  • One Way to Visualize Bias and Variance
  • Underfitting vs Overfitting Visualization
  • Balancing Off the Bias-Variance Ratio
  • Regularization in scikit-learn
  • Regularization, Take Two
  • Dimensionality Reduction
  • PCA and isomap
  • The Advantages of Dimensionality Reduction
  • The LIBSVM format
  • Life-cycles of Machine Learning Development
  • Data Splitting into Training and Test Datasets
  • ML Model Tuning Visually
  • Data Splitting in scikit-learn
  • Cross-Validation Technique
  • Hands-on Exercise
  • Classification (Supervised ML) Examples
  • Classifying with k-Nearest Neighbors
  • k-Nearest Neighbors Algorithm
  • k-Nearest Neighbors Algorithm
  • Hands-on Exercise
  • Regression Analysis
  • Regression vs Correlation
  • Regression vs Classification
  • Simple Linear Regression Model
  • Linear Regression Illustration
  • Least-Squares Method (LSM)
  • Gradient Descent Optimization
  • Multiple Regression Analysis
  • Evaluating Regression Model Accuracy
  • The R
  • 2
  • Model Score
  • The MSE Model Score
  • Logistic Regression (Logit)
  • Interpreting Linear Logistic Regression Results
  • Decision Trees
  • Decision Tree Terminology
  • Properties of Decision Trees
  • Decision Tree Classification in the Context of Information Theory
  • The Simplified Decision Tree Algorithm
  • Using Decision Trees
  • Random Forests
  • Hands-On Exercise
  • Hands-on Exercise
  • Support Vector Machines (SVMs)
  • Naive Bayes Classifier (SL)
  • Naive Bayesian Probabilistic Model in a Nutshell
  • Bayes Formula
  • Classification of Documents with Naive Bayes
  • Unsupervised Learning Type: Clustering
  • Clustering Examples
  • k-Means Clustering (UL)
  • k-Means Clustering in a Nutshell
  • k-Means Characteristics
  • Global vs Local Minimum Explained
  • Hands-On Exercise
  • XGBoost
  • Gradient Boosting
  • Hands-On Exercise
  • A Better Algorithm or More Data?
  • Summary

Chapter 8. AI Systems and Platforms Overview

  • Heuristics and Expert Systems
  • What is AI?
  • AI, Machine Learning (ML), and Deep Learning
  • Neural Networks in AI
  • Deep Learning Neural Networks
  • TensorFlow
  • Keras
  • Colab Notebooks
  • PyTorch
  • ML at Scale: Python on Spark - PySpark
  • AWS IoT Service
  • AWS ML Services
  • SageMaker
  • DeepLens
  • The DeepLens Device (an IoT Device)
  • A DeepLens Use Case
  • Rekognition
  • Rekognition's Object and Scene Detection Demo
  • AWS ML Algorithm Marketplace

Chapter 9. Text Mining and NLP Overview

  • What is Text Mining?
  • The Common Text Mining Tasks
  • What is Natural Language Processing (NLP)?
  • Some of the NLP Use Cases
  • Machine Learning in Text Mining and NLP
  • Machine Learning in NLP
  • TF-IDF
  • The Feature Hashing Trick
  • Stemming
  • Example of Stemming
  • Stop Words
  • Popular Text Mining and NLP Libraries and Packages
  • Google Natural Language Cloud Service
  • Trying it Out
  • How Google NL Service Works
  • Google Translate Service
  • Comprehend
  • How Comprehend Works
  • Comprehend in the AWS Management Console
  • Use Cases for Comprehend
  • Lex
  • Polly
  • Polly's Text-to-Speech Dashboard
  • Example of Using Polly's AWS CLI
  • Transcribe
  • Translate