Data Visualization with matplotlib and seaborn in Python

March 4, 2020

This tutorial is adapted from Web Age course Advanced Data Analytics with Pyspark. 1.1 Data Visualization  The common wisdom states that ‘Seeing is believing and a…

Data Science and ML Algorithms with PySpark

December 11, 2019

This tutorial is adapted from Web Age course Practical Machine Learning with Apache Spark. 8.1 Types of Machine Learning There are three main types of machine…

Introduction to Jupyter Notebooks

November 25, 2019

This tutorial is adapted from Web Age course Practical Machine Learning with Apache Spark. 6.1 Python Dev Tools and REPLs  In addition to the standard Python…

Data Visualization in Python using Matplotlib

November 25, 2019

7.1 What is Data Visualization?  The common wisdom states that seeing is believing and a picture is worth a thousand words.  Data visualization techniques help users…

Distributed Computing Concepts for Data Engineers

November 15, 2019

1.1 The Traditional Client–Server Processing Pattern It is good for small-to-medium data set sizes. Fetching 1TB worth of data might take longer than 1 hour….

What is Data Engineering?

November 15, 2019

1.1 Data is King Data is king and it outlives applications. Applications outlive integrations. Organizations striving to become data-driven need to institute efficient, intelligent, and robust ways for…

PySpark Shell

October 17, 2019

1.1 What is Spark Shell? The Spark Shell offers interactive command-line environments for Scala and Python users.  SparkR Shell has only been thoroughly tested to work…

Introduction to PySpark

October 16, 2019

1.1 What is Apache Spark? Apache Spark (Spark) is a general-purpose processing system for large- scale data. Spark is effective for data processing of up to…

Data Ingestion in AWS

October 16, 2019

Multipart Upload Overview The Multipart upload API enables you to upload large objects in parts. Multipart uploading is a three-step process: ◊ You initiate the…

Python for Data Science

July 25, 2019

This tutorial is adapted from Web Age course  Applied Data Science with Python. This tutorial provides  quick overview of Python modules and high-power features, NumPy library,…