Popular Courses

Browse Our Free Resources

  • whitepapers
  • whitepapers
  • webinars
  • blogs

Our Locations

Training Centres

Vancouver, BC
Calgary, AB
Edmonton, AB
Toronto, ON
Ottawa, ON
Montreal, QC
Hunt Valley

locations map


550 6th Av SW
Suite 475
Calgary, AB
T2P 0S2


821A Bloor Street West
Toronto, ON
M6G 1M1


409 Granville St
Suite 902
Vancouver, BC
V6C 1T2

U.S. Office

436 York Road
Suite 1
Jenkintown, PA

Other Locations

Dallas, TX
Miami, FL

Home > Training > Big Data > Big Data Training: Data Science for Solution Architects

Big Data Training: Data Science for Solution Architects

Quick Enroll

Course#: WA2393

This training course helps Solution Architects and other IT practitioners understand the value proposition, methodology and techniques of the emerging discipline of Data Science.  The class also introduces the students to a number of existing production-ready technologies and capabilities that enable enterprises to build cost-efficient Big Data processing solutions.


This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics. The course covers the fundamental and advanced concepts and methods of deriving business insights from raw data using cost-effective data processing solutions. The course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.


  • Applied Data Science and Business Analytics
  • Algorithms, Techniques and Common Analytical Methods
  • NoSQL and Big Data Systems Overview
  • MapReduce
  • Big Data Business Intelligence and Analytics
  • Visualizing and Reporting Processed Results
  • Data Analysis with R
  • Hadoop Programming Ecosystem


Enterprise Architects, Solution Architects, Information Technology Architects, Business Analysts, Senior Developers, and Team Leads


Participants should have the general knowledge of statistics and programming


4 days

Outline of WA2393 Big Data Training: Data Science for Solution Architects Training

Chapter 1. Applied Data Science

  • What is Data Science?
  • Data Science Ecosystem
  • Data Mining vs. Data Science
  • Business Analytics vs. Data Science
  • Who is a Data Scientist?
  • Data Science Skill Sets Venn Diagram
  • Data Scientists at Work
  • Examples of Data Science Projects
  • An Example of a Data Product
  • Applied Data Science at Google
  • Data Science Gotchas
  • Summary

Chapter 2. Data Science Algorithms and Analytical Methods

  • Supervised vs Unsupervised Machine Learning
  • Supervised Machine Learning Algorithms
  • Unsupervised Machine Learning Algorithms
  • Choose the Right Algorithm
  • Life-cycles of Machine Learning Development
  • Classifying with k-Nearest Neighbors (SL)
  • k-Nearest Neighbors Algorithm
  • k-Nearest Neighbors Algorithm
  • Decision Trees (SL)
  • Naive Bayes Classifier (SL)
  • Naive Bayesian Probabilistic Model in a Nutshell
  • Unsupervised Learning Type: Clustering
  • K-Means Clustering (UL)
  • K-Means Clustering in a Nutshell
  • Time-Series Analysis
  • Decomposing Time-Series
  • Monte-Carlo Simulation (Method)
  • Who Uses Monte-Carlo Simulation?
  • Monte-Carlo Simulation in a Nutshell
  • Summary

Chapter 3. Introduction to R

  • Introduction
  • Positioning of R in the Data Science Arena
  • R Integrated Development Environments
  • General Notes on R Commands and Statements
  • R Data Structures
  • R Objects and Workspace
  • Assignment Operators
  • Assignment Example
  • Arithmetic Operators
  • Logical Operators
  • System Date and Time
  • Operations
  • User-defined Functions
  • User-defined Function Example
  • R Code Example
  • Control Statements
  • Conditional Execution
  • Repetitive Execution
  • Built-in Functions
  • Reading Data from Files into Vectors
  • Example of Reading Data from a File
  • Writing Data to a File
  • Example of Writing Data to a File
  • Matrix Data Structure
  • Creating Matrices
  • Working with Data Frames
  • Matrices vs Data Frames
  • A Data Frame Sample
  • Accessing Data Cells
  • Getting Info About a Data Frame
  • Selecting Columns in Data Frames
  • Selecting Rows in Data Frames
  • Getting a Subset of a Data Frame
  • Sorting (ordering) Data in Data Frames by Attribute(s)
  • Applying Functions to Matrices and Data Frames
  • Using the apply() Function
  • Example of Using apply()
  • Listing Objects in Workspace
  • Saving Your Workspace
  • Loading Your Workspace
  • Batch (Unattended) Processing
  • Importing Data into R
  • Exporting Data from R
  • Standard R Packages
  • Extending R
  • CRAN Page
  • Summary

Chapter 4. R Statistical Computing Features

  • Statistical Computing Features
  • Descriptive Statistics
  • Basic Statistical Functions
  • Examples of Using Basic Statistical Functions
  • Non-uniformity of a Probability Distribution
  • Writing Your Own skew and kurtosis Functions
  • Generating Normally Distributed Random Numbers
  • Generating Uniformly Distributed Random Numbers
  • Using the summary() Function
  • Math Functions Used in Data Analysis
  • Examples of Using Math Functions
  • Correlations
  • Correlation Example
  • Testing Correlation Coefficient for Significance
  • The cor.test() Function
  • The cor.test() Example
  • Regression Analysis
  • Types of Regression
  • Simple Linear Regression Model
  • Least-Squares Method (LSM)
  • LSM Assumptions
  • Fitting Linear Regression Models in R
  • Example of Using lm()
  • Confidence Intervals for Model Parameters
  • Example of Using lm() with a Data Frame
  • Regression Models in Excel
  • Multiple Regression Analysis
  • Finding the Best-Fitting Regression Model
  • Comparing Regression Models
  • Summary

Chapter 5. Defining Big Data

  • Transforming Data into Business Information
  • Quality of Data
  • Gartner's Definition of Big Data
  • More Definitions of Big Data
  • Processing Big Data
  • Challenges Posed by Big Data
  • The Cloud and Big Data
  • The Business Value of Big Data
  • Big Data: Hype or Reality?
  • Big Data Quiz
  • Big Data Quiz Answers
  • Summary

Chapter 6. What is NoSQL?

  • Limitations of Relational Databases
  • Limitations of Relational Databases (Cont'd)
  • Defining NoSQL
  • What are NoSQL (Not Only SQL) Databases?
  • The Past and Present of the NoSQL World
  • NoSQL Database Properties
  • NoSQL Benefits
  • NoSQL Database Storage Types
  • The CAP Theorem
  • Mechanisms to Guarantee a Single CAP Property
  • Limitations of NoSQL Databases
  • Big Data Sharding
  • Sharding Example
  • Quiz
  • Quiz Answers
  • Summary

Chapter 7. MapReduce Overview

  • MapReduce Defined
  • Google's MapReduce
  • The Map Phase of MapReduce
  • The Reduce Phase of MapReduce
  • MapReduce Explained
  • MapReduce Word Count Job
  • MapReduce Shared-Nothing Architecture
  • Similarity with SQL Aggregation Operations
  • Example of Map & Reduce Operations using JavaScript
  • Problems Suitable for Solving with MapReduce
  • Typical MapReduce Jobs
  • Fault-tolerance of MapReduce
  • Distributed Computing Economics
  • MapReduce Systems
  • Summary

Chapter 8. Introduction to MongoDB

  • MongoDB
  • MongoDB Features (Cont'd)
  • MongoDB's Logo
  • Positioning of MongoDB
  • MongoDB Limitations
  • MongoDB Operational Intelligence
  • MongoDB Use Cases
  • MongoDB Data Model
  • The _id Primary Key Filed Considerations
  • Terminology
  • MongoDB Data Model
  • Data Modeling in RDBMS
  • Data Modeling in MongoDB
  • MongoDB Data Modeling
  • A Sample JSON Document Matching the Schema
  • Data Lifecycle Management
  • Data Lifecycle Management: TTL
  • Data Lifecycle Management: Capped Collections
  • MongoDB Query Language (QL)
  • The
  • find
  • and
  • findOne
  • Methods
  • The
  • find
  • and
  • findOne
  • Methods
  • A MongoDB QL Example
  • Data Inserts
  • Creating an Index
  • MongoDB vs Apache CouchDB
  • Summary

Chapter 9. Hadoop Overview

  • Apache Hadoop
  • Apache Hadoop Logo
  • Typical Hadoop Applications
  • Hadoop Clusters
  • Hadoop Design Principles
  • Hadoop's Core Components
  • Hadoop Simple Definition
  • High-Level Hadoop Architecture
  • Hadoop-based Systems for Data Analysis
  • Hadoop Caveats
  • Summary

Chapter 10. Hadoop Distributed File System Overview

  • Hadoop Distributed File System
  • Data Blocks
  • Data Block Replication Example
  • HDFS NameNode Directory Diagram
  • Accessing HDFS
  • Examples of HDFS Commands
  • Client Interactions with HDFS for the Read Operation
  • Read Operation Sequence Diagram
  • Client Interactions with HDFS for the Write Operation
  • Communication inside HDFS
  • Summary

Chapter 11. MapReduce with Hadoop

  • Hadoop's MapReduce
  • MapReduce v1 ("Classic MapReduce")
  • JobTracker and TaskTracker
  • YARN (MapReduce v2)
  • MapReduce Programming Options
  • Java MapReduce API
  • The Structure of a Java MapReduce Program
  • The Mapper Class
  • The Reducer Class
  • The Driver Class
  • Compiling Classes
  • Running the MapReduce Job
  • The Structure of a Single MapReduce Program
  • Combiner Pass (Optional)
  • Hadoop's Streaming MapReduce
  • Python Word Count Mapper Program Example
  • Python Word Count Reducer Program Example
  • Setting up Java Classpath for Streaming Support
  • Streaming Use Cases
  • The Streaming API vs Java MapReduce API
  • Amazon Elastic MapReduce
  • Summary

Chapter 12. Apache Pig Scripting Platform

  • What is Pig?
  • Pig Latin
  • Apache Pig Logo
  • Pig Execution Modes
  • Local Execution Mode
  • MapReduce Execution Mode
  • Running Pig
  • Running Pig in Batch Mode
  • What is Grunt?
  • Pig Latin Statements
  • Pig Programs
  • Pig Latin Script Example
  • SQL Equivalent
  • Differences between Pig and SQL
  • Statement Processing in Pig
  • Comments in Pig
  • Supported Simple Data Types
  • Supported Complex Data Types
  • Arrays
  • Defining Relation's Schema
  • The bytearray Generic Type
  • Using Field Delimiters
  • Referencing Fields in Relations
  • Summary

Chapter 13. Apache Pig Relational and Eval Operators

  • Pig Relational Operators
  • Example of Using the JOIN Operator
  • Example of Using the Order By Operator
  • Caveats of Using Relational Operators
  • Pig Eval Functions
  • Caveats of Using Eval Functions (Operators)
  • Example of Using Single-column Eval Operations
  • Example of Using Eval Operators For Global Operations
  • Summary

Chapter 14. Apache Pig Performance

  • Apache Pig Performance
  • Performance Enhancer - Use the Right Schema Type
  • Performance Enhancer - Apply Data Filters
  • Use the PARALLEL Clause
  • Examples of the PARALLEL Clause
  • Performance Enhancer - Limiting the Data Sets
  • Displaying Execution Plan
  • Summary

Chapter 15. Hive

  • What is Hive?
  • Apache Hive Logo
  • Hive's Value Proposition
  • Who uses Hive?
  • Hive's Main Sub-Systems
  • Hive Features
  • Hive Architecture
  • HiveQL
  • Where are the Hive Tables Located?
  • Hive Command-line Interface (CLI)
  • Summary

Chapter 16. Hive Command-line Interface

  • Hive Command-line Interface (CLI)
  • The Hive Interactive Shell
  • Running Host OS Commands from the Hive Shell
  • Interfacing with HDFS from the Hive Shell
  • The Hive in Unattended Mode
  • The Hive CLI Integration with the OS Shell
  • Executing HiveQL Scripts
  • Comments in Hive Scripts
  • Variables and Properties in Hive CLI
  • Setting Properties in CLI
  • Example of Setting Properties in CLI
  • Hive Namespaces
  • Using the SET Command
  • Setting Properties in the Shell
  • Setting Properties for the New Shell Session
  • Summary

Chapter 17. Hive Data Definition Language

  • Hive Data Definition Language
  • Creating Databases in Hive
  • Using Databases
  • Creating Tables in Hive
  • Supported Data Type Categories
  • Common Primitive Types
  • Example of the CREATE TABLE Statement
  • The STRUCT Type
  • Table Partitioning
  • Table Partitioning
  • Table Partitioning on Multiple Columns
  • Viewing Table Partitions
  • Row Format
  • Data Serializers / Deserializers
  • File Format Storage
  • More on File Formats
  • The EXTERNAL DDL Parameter
  • Example of Using EXTERNAL
  • Creating an Empty Table
  • Dropping a Table
  • Table / Partition(s) Truncation
  • Alter Table/Partition/Column
  • Views
  • Create View Statement
  • Why Use Views?
  • Restricting Amount of Viewable Data
  • Examples of Restricting Amount of Viewable Data
  • Creating and Dropping Indexes
  • Describing Data
  • Summary

Chapter 18. Hive Select Statement

  • HiveQL
  • The SELECT Statement Syntax
  • The WHERE Clause
  • Examples of the WHERE Statement
  • Partition-based Queries
  • Example of an Efficient SELECT Statement
  • The DISTINCT Clause
  • Supported Numeric Operators
  • Built-in Mathematical Functions
  • Built-in Aggregate Functions
  • Built-in Statistical Functions
  • Other Useful Built-in Functions
  • The GROUP BY Clause
  • The HAVING Clause
  • The LIMIT Clause
  • The ORDER BY Clause
  • The JOIN Clause
  • The CASE … Clause
  • Example of CASE … Clause
  • Summary

Chapter 19. Apache Sqoop

  • What is Sqoop?
  • Apache Sqoop Logo
  • Sqoop Import / Export
  • Sqoop Help
  • Examples of Using Sqoop Commands
  • Data Import Example
  • Fine-tuning Data Import
  • Controlling the Number of Import Processes
  • Data Splitting
  • Helping Sqoop Out
  • Example of Executing Sqoop Load in Parallel
  • A Word of Caution: Avoid Complex Free-Form Queries
  • Using Direct Export from Databases
  • Example of Using Direct Export from MySQL
  • More on Direct Mode Import
  • Changing Data Types
  • Example of Default Types Overriding
  • File Formats
  • The Apache Avro Serialization System
  • Binary vs Text
  • More on the SequenceFile Binary Format
  • Generating the Java Table Record Source Code
  • Data Export from HDFS
  • Export Tool Common Arguments
  • Data Export Control Arguments
  • Data Export Example
  • Using a Staging Table
  • INSERT and UPDATE Statements
  • INSERT Operations
  • UPDATE Operations
  • Example of the Update Operation
  • Failed Exports
  • Summary

Chapter 20. Apache HBase

  • What is HBase?
  • HBase Design
  • HBase Features
  • The Write-Ahead Log (WAL) and MemStore
  • HBase vs RDBS
  • HBase vs Apache Cassandra
  • Interfacing with HBase
  • HBase Thrift And REST Gateway
  • HBase Table Design
  • Column Families
  • A Cell's Value Versioning
  • Timestamps
  • Accessing Cells
  • HBase Table Design Digest
  • Table Horizontal Partitioning with Regions
  • HBase Compaction
  • Loading Data in HBase
  • HBase Shell
  • HBase Shell Command Groups
  • Creating and Populating a Table in HBase Shell
  • Getting a Cell's Value
  • Counting Rows in an HBase Table
  • Summary
Address Start Date End Date
Instructor Led Virtual 05/08/2017 05/11/2017
Instructor Led Virtual 07/31/2017 08/03/2017
Instructor Led Virtual 09/05/2017 09/08/2017
Instructor Led Virtual 10/16/2017 10/19/2017
We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC.
*Your name:

*Your e-mail:


*Company name:

Additional notes:

We have received your message. A sales representative will contact you soon.

Thank you!.

more details
buy this course

05/08/2017 - Online Virtual

07/31/2017 - Online Virtual

09/05/2017 - Online Virtual

10/16/2017 - Online Virtual

Other Details

Register for a courseware sample

It's simple, and free.


Thank You!

You will receive an email shortly containing a link to download the requested sample of the labs for this course.