Posted on April 15, 2021AWS Glue PySpark Extensions This tutorial is adapted from the Web Age Course Data Analytics on AWS. 1.1 AWS Glue and Spark AWS Glue is based on the Apache Spark platform extending it with Glue-specific libraries. In this tutorial, we will only review Glue’s support for PySpark. As of version 2.0, Glue supports Python 3, which you should use in your development. Continue reading “AWS Glue PySpark Extensions”
Posted on April 5, 2021April 9, 2021How to Repair and Normalize Data with Pandas? This tutorial is adapted from Web Age course Data Engineering Bootcamp Training Using Python and PySpark. When you embark on a new data engineering/data science/machine learning project, right off the bat you may be faced with defects in your input dataset, including but not limited to these issues: Continue reading “How to Repair and Normalize Data with Pandas?”
Posted on April 3, 2021April 9, 2021Data Visualization and EDA with Pandas and Seaborn This tutorial is adapted from Web Age course Intermediate Data Engineering with Python. Data visualization is a great vehicle for communicating data analysis results to potentially not technical stakeholders, as well as being a critical activity in exploratory data analysis (EDA). In this tutorial, you will learn about data visualization options available in Python. Continue reading “Data Visualization and EDA with Pandas and Seaborn”