Module 4: Data Understanding & Preparation

Overview

Welcome to the Data Understanding & Preparation module of the EDSP mentoring program!

This module introduces the next steps of the data science process: Data Understanding and Data Preparation. This begins with data profiling to understand the current state of the data, and whether it can satisfy the business objectives of the project. Since data is often messy and incomplete, some data preparation it typically necessary for it to satisfy the needs of machine learning development. Exploratory data analysis (EDA) aims to identify the relationships existing among the elements (features) of the dataset along with the features that exert the greatest influence on the Target (the value being predicted). Since potential correlations may be obscured within the data, and because machine learning algorithms expect data in particular formats (e.g., all numerical values), some form of feature engineering may be necessary to reveal the full predictive power in the data, and to make that data satisfy machine learning requirements.

What you’ll learn

  • Data Profiling
  • Data Preparation
  • Exploratory Data Analysis (EDA)
  • Feature Engineering

Topic Kickoff

Resources Links
Recording Recording
Presentation Presentation

Table of Contents

Resources Links
Online Book: (oreilly.com) Python Feature Engineering Cookbook
Tutorial: Jupyter Notebook Data Profiling
Tutorial: Jupyter Notebook Data Preparation
Tutorial: Jupyter Notebook Exploratory Data Analysis (EDA)
Tutorial: Jupyter Notebook Feature Engineering
Data for Tutorials Data

Additional / Optional Resources