Introduction to Big Data and NoSQL Training

Course #:WA2192

Introduction to Big Data and NoSQL Training

Courseware: Available for sale

We live in the information age where business success is grounded on the ability of organizations to convert raw data coming from various sources into high-grade business information.

Many organizations are overwhelmed by the sheer volume of information they have to process in order to stay competitive.  Traditional database systems may become either prohibitively expensive to handle the exponential growth of data volumes or found unsuitable for the job.  At this point, the data gets mystically morphed into the Big Data.

This course provides an introduction to Big Data as well as NoSQL (Not Only SQL) database systems.  The fundamental concepts of and ideas behind Big Data / NoSQL technologies are methodically explored and many buzzwords demystified.  The course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the subject.

Topics

  • Defining Big Data
  • Big Data Stores Overview
  • NoSQL
  • Big Data Business Intelligence and Analytics
  • Real World Case Studies
  • Adopting NoSQL

Audience

General audience including business and technology team leadership

Pre-requisites

Basic programming skills, some knowledge of SQL

Duration

 1 day

Outline of Introduction to Big Data and NoSQL Training

Chapter 1. Introduction to NoSQL Systems

  • Gartner's Definition of Big Data
  • The V
  • 3
  • Properties
  • Limitations of Relational Databases
  • Limitations of Relational Databases (Cont'd)
  • What are NoSQL (Not Only SQL) Databases?
  • What are NoSQL Databases?
  • The Past and Present of the NoSQL World
  • NoSQL Database Properties
  • NoSQL Benefits
  • Use Cases for NoSQL Database Systems
  • NoSQL Database Storage Types
  • The CAP Theorem
  • Mechanisms to Guarantee a Single CAP Property
  • NoSQL Systems CAP Triangle
  • Limitations of NoSQL Databases
  • Mix-and-Match Approach
  • Big Data Sharding
  • Sharding Example
  • Google BigTable
  • BigTable-based Applications
  • BigTable Design
  • Barriers to Adoption
  • Dismantling Barriers to Adoption
  • Industry trends
  • NoSQL Technology Adoption Action Plan
  • Quiz
  • Quiz Answers
  • Summary

Chapter 2. Introduction to Hadoop

  • The Client – Server Processing Pattern
  • Apache Hadoop
  • Apache Hadoop Logo
  • Typical Hadoop Applications
  • Hadoop Clusters
  • Hadoop Distributions
  • Hadoop's Main Components
  • Hadoop Distributed File System (HDFS)
  • HDFS Considerations
  • Data Blocks
  • HDFS NameNode Directory Diagram
  • HDFS Balancing
  • Accessing HDFS
  • Examples of HDFS Commands
  • Other Supported File Systems
  • YARN
  • Hadoop-based Systems for Data Analysis
  • MapReduce
  • Similarity with SQL Aggregation Operations
  • MapReduce Word Count Example
  • Distributed Computing Economics
  • Discussion: Divide and Conquer
  • Apache Pig
  • Pig Latin
  • Running Pig
  • Pig Latin Script Example
  • What is Hive?
  • Hive's Value Proposition
  • Who uses Hive?
  • What Hive Does Not Have
  • HiveQL
  • Working with Hive Tables
  • Summary

Chapter 3. Apache HBase

  • What is HBase?
  • HBase Design
  • HBase Master (HMaster)
  • Sparse Data Sets
  • Regions and Region Servers
  • HBase Features
  • HBase High Availability
  • The Write-Ahead Log (WAL) and MemStore
  • HBase vs RDBS
  • HBase vs RDBS (Cont'd)
  • Interfacing with HBase
  • HBase Thrift and REST Gateway
  • HBase Table Design
  • Column Families
  • A Cell's Value Versioning
  • Timestamps
  • Accessing Cells
  • HBase Table Design Digest
  • The Conceptual View of an HBase Table
  • HBase Compaction
  • Loading Data in HBase
  • Column Families Notes
  • Cardinality of Column Families
  • Hotspotting
  • Rowkey Design Notes
  • Security
  • HBase Shell
  • HBase Shell Command Groups
  • Creating and Populating a Table Using HBase Shell
  • Getting a Cell's Value
  • Counting Rows in an HBase Table
  • HBase Java Client
  • HBase Scanners
  • The Scan Class
  • The KeyValue Class
  • The Result Class
  • Getting Versions of Cell Values Example
  • The Cell Interface
  • HBase Java Client Example
  • Scanning the Table Rows
  • Dropping a Table
  • The Bytes Utility Class
  • Table Schema Main Rules to Follow
  • Good Use Cases for HBase
  • Not Good Use Cases for HBase
  • Business Continuity Caveats
  • Summary

Chapter 4. Apache Cassandra

  • What is Apache Cassandra?
  • Main Features
  • Peer-to-Peer (No Master)
  • Wide Column Store NoSQL Databases
  • Cassandra Model vs Relational Model
  • Column Families
  • Columns
  • Simplified Data Model
  • Data Model
  • The Cap Placement
  • CQL
  • CQL Simple Examples
  • The Update Statement
  • Update Caveats
  • Update Statement with TTL and TIMESTAMP Examples
  • Collections
  • Example of Using a Set Collection
  • Using the List Collection
  • Data Replication
  • Visualizing Data Replication
  • The Write Path
  • Sequential Data Storage Engine
  • Java Client Code Example
  • Data Distribution
  • Native Aggregate Functions
  • Creating UDFs
  • HBase vs Apache Cassandra
  • Cassandra vs MongoDB
  • Security
  • WAN-Wide High Availability
  • Summary

Lab Exercises

Lab 1. Learning the Lab Environment
Lab 2. The Hadoop Distributed File System
Lab 3. Using HBase Shell
Lab 4. Comparing NoSQL Systems

We regularly offer classes in these and other cities. Atlanta, Austin, Baltimore, Calgary, Chicago, Cleveland, Dallas, Denver, Detroit, Houston, Jacksonville, Miami, Montreal, New York City, Orlando, Ottawa, Philadelphia, Phoenix, Pittsburgh, Seattle, Toronto, Vancouver, Washington DC.