Topics
- Defining Big Data
- Big Data Stores Overview
- NoSQL
- Big Data Business Intelligence and Analytics
- Real World Case Studies
- Adopting NoSQL
Audience
General audience including business and technology team leadership
Pre-requisites
Basic programming skills, some knowledge of SQL
Duration
1 day
Outline for Introduction to Big Data and NoSQL Training
Chapter 1. Introduction to NoSQL Systems
- Gartner's Definition of Big Data
- The V
- 3
- Properties
- Limitations of Relational Databases
- Limitations of Relational Databases (Cont'd)
- What are NoSQL (Not Only SQL) Databases?
- What are NoSQL Databases?
- The Past and Present of the NoSQL World
- NoSQL Database Properties
- NoSQL Benefits
- Use Cases for NoSQL Database Systems
- NoSQL Database Storage Types
- The CAP Theorem
- Mechanisms to Guarantee a Single CAP Property
- NoSQL Systems CAP Triangle
- Limitations of NoSQL Databases
- Mix-and-Match Approach
- Big Data Sharding
- Sharding Example
- Google BigTable
- BigTable-based Applications
- BigTable Design
- Barriers to Adoption
- Dismantling Barriers to Adoption
- Industry trends
- NoSQL Technology Adoption Action Plan
- Quiz
- Quiz Answers
- Summary
Chapter 2. Introduction to Hadoop
- The Client – Server Processing Pattern
- Apache Hadoop
- Apache Hadoop Logo
- Typical Hadoop Applications
- Hadoop Clusters
- Hadoop Distributions
- Hadoop's Main Components
- Hadoop Distributed File System (HDFS)
- HDFS Considerations
- Data Blocks
- HDFS NameNode Directory Diagram
- HDFS Balancing
- Accessing HDFS
- Examples of HDFS Commands
- Other Supported File Systems
- YARN
- Hadoop-based Systems for Data Analysis
- MapReduce
- Similarity with SQL Aggregation Operations
- MapReduce Word Count Example
- Distributed Computing Economics
- Discussion: Divide and Conquer
- Apache Pig
- Pig Latin
- Running Pig
- Pig Latin Script Example
- What is Hive?
- Hive's Value Proposition
- Who uses Hive?
- What Hive Does Not Have
- HiveQL
- Working with Hive Tables
- Summary
Chapter 3. Apache HBase
- What is HBase?
- HBase Design
- HBase Master (HMaster)
- Sparse Data Sets
- Regions and Region Servers
- HBase Features
- HBase High Availability
- The Write-Ahead Log (WAL) and MemStore
- HBase vs RDBS
- HBase vs RDBS (Cont'd)
- Interfacing with HBase
- HBase Thrift and REST Gateway
- HBase Table Design
- Column Families
- A Cell's Value Versioning
- Timestamps
- Accessing Cells
- HBase Table Design Digest
- The Conceptual View of an HBase Table
- HBase Compaction
- Loading Data in HBase
- Column Families Notes
- Cardinality of Column Families
- Hotspotting
- Rowkey Design Notes
- Security
- HBase Shell
- HBase Shell Command Groups
- Creating and Populating a Table Using HBase Shell
- Getting a Cell's Value
- Counting Rows in an HBase Table
- HBase Java Client
- HBase Scanners
- The Scan Class
- The KeyValue Class
- The Result Class
- Getting Versions of Cell Values Example
- The Cell Interface
- HBase Java Client Example
- Scanning the Table Rows
- Dropping a Table
- The Bytes Utility Class
- Table Schema Main Rules to Follow
- Good Use Cases for HBase
- Not Good Use Cases for HBase
- Business Continuity Caveats
- Summary
Chapter 4. Apache Cassandra
- What is Apache Cassandra?
- Main Features
- Peer-to-Peer (No Master)
- Wide Column Store NoSQL Databases
- Cassandra Model vs Relational Model
- Column Families
- Columns
- Simplified Data Model
- Data Model
- The Cap Placement
- CQL
- CQL Simple Examples
- The Update Statement
- Update Caveats
- Update Statement with TTL and TIMESTAMP Examples
- Collections
- Example of Using a Set Collection
- Using the List Collection
- Data Replication
- Visualizing Data Replication
- The Write Path
- Sequential Data Storage Engine
- Java Client Code Example
- Data Distribution
- Native Aggregate Functions
- Creating UDFs
- HBase vs Apache Cassandra
- Cassandra vs MongoDB
- Security
- WAN-Wide High Availability
- Summary
Lab Exercises
Lab 1. Learning the Lab Environment
Lab 2. The Hadoop Distributed File System
Lab 3. Using HBase Shell
Lab 4. Comparing NoSQL Systems