Big Data and Analytics for Business Users Training

Course #:WA2186

Big Data and Analytics for Business Users Training

Courseware: Available for sale

Data is one of the most valuable assets that your organization possesses.  Every day you are creating more data and potentially passing up opportunities to harvest that data and use it to accelerate the achievement of your organization’s strategic objectives.  Big Data and Analytics represent an emerging trend around harvesting, analyzing, and capitalizing on the wealth of data that is within the grasp of your enterprise.

“For every 100 open Big Data jobs, there are only two qualified candidates” - fastcompany.com

This one day primer introduces Cloud Computing, Big Data, and the emerging discipline of Data Analytics.  Attention will be given to the three V’s of Big Data: Volume, Velocity, and Variety as well as the fourth V of Value.  You’ll learn about these critical elements and the powerful value proposition that these capabilities provide.  What are the processes, tools, and personnel that will be needed in order to take advantage of this sea change in information management?  This essential course will equip you to understand your customers better and how to deliver more value today.

Topics

  • Cloud Computing Basics
  • Introduction to Big Data
  • Understanding Data Analytics
  • Understanding Predictive Analytics
  • Basics of Analytical Modeling
  • Unpacking the Value, Volume, Velocity, and Variety
  • Organizational Considerations
  • Recommended Next Steps

Audience

Managers, Analysts, Architects, and Team Leads

Pre-requisites

None

Duration

1 day

Outline of Big Data and Analytics for Business Users Training

Chapter 1. Defining Big Data

  • In-Class Discussion
  • Gartner's Definition of Big Data
  • More Definitions of Big Data
  • Transforming Data into Business Information
  • Challenges Posed by Big Data
  • Processing Big Data
  • Apache Hadoop
  • The Cloud and Big Data
  • The CAP Theorem
  • Summary

Chapter 2. Hadoop Overview

  • The Client – Server Processing Pattern
  • Apache Hadoop
  • Apache Hadoop Logo
  • Typical Hadoop Applications
  • Hadoop Clusters
  • Hadoop Distributions
  • Hadoop's Main Components
  • HDFS
  • HDFS Blocks
  • YARN
  • Hadoop-based Systems for Data Analysis
  • MapReduce
  • Similarity with SQL Aggregation Operations
  • Distributed Computing Economics
  • Discussion: Divide and Conquer
  • Apache Pig
  • Pig Latin
  • Running Pig
  • Pig Latin Script Example
  • What is Hive?
  • Hive's Value Proposition
  • Who uses Hive?
  • What Hive Does Not Have
  • HiveQL
  • Working with Hive Tables
  • What is HBase?
  • HBase vs RDBS
  • Interfacing with HBase
  • HBase Table Design Digest
  • A Cell's Value Versioning
  • Creating and Populating a Table in HBase Shell
  • Getting a Cell's Value
  • Counting Rows in an HBase Table
  • Summary

Chapter 3. Big Data Analytics in the Cloud

  • Data is King
  • Big Data Stores in the Cloud
  • Example: AWS Simple Storage Service (S3)
  • MapReduce (and Hadoop) in the Cloud
  • Information and Data Security
  • Data-at-rest Security Examples
  • Example of Object Encryption in S3
  • One S3 Use Case: Backup and Archiving
  • Data Analytics Services in the Cloud
  • Analytics Services with AWS
  • AWS EMR: Software Configuration Screen
  • AWS EMR: Hardware Configuration Screen
  • Big Data Analytics Solutions from Google Cloud
  • Google Data Processing and Analytics Pipelines
  • Google BigQuery
  • Machine Learning
  • Microsoft Azure ML Studio
  • Machine Learning Pipeline
  • Summary

Chapter 4. Making Big Data Small Techniques

  • What is Data Science?
  • Data Science, Machine Learning, AI?
  • Making Big Data Small
  • Descriptive Statistics
  • Correlation
  • Reducing the Number of Data Attributes
  • Lasso Regularization
  • Sampling Examples
  • Data Compression
  • Summary

Chapter 5. Introduction to Apache Spark

  • What is Apache Spark
  • Where to Get Spark?
  • The Spark Platform
  • Spark Logo
  • Common Spark Use Cases
  • Running Spark on a Cluster
  • The Driver Process
  • Spark Shell
  • Interfaces with Data Storage Systems
  • Limitations of Hadoop's MapReduce
  • Spark vs MapReduce
  • The Resilient Distributed Dataset (RDD)
  • Spark Streaming (Micro-batching)
  • Spark SQL
  • Example of Spark SQL
  • Spark Machine Learning Library
  • Example: Using Random Forests with Spark MLlib
  • The Output (the “Confusion” matrix)
  • Dumping the Trained Model
  • Clustering
  • Finding Centroids Example
  • Using kMeans Module with Spark MLlib
  • Printing the Centroids
  • GraphX
  • Summary
We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC.