Module 4: Data Understanding & Preparation
Overview
Welcome to the Data Understanding & Preparation module of the EDSP mentoring program!
This module introduces the next steps of the data science process: Data Understanding and Data Preparation. This begins with data profiling to understand the current state of the data, and whether it can satisfy the business objectives of the project. Since data is often messy and incomplete, some data preparation it typically necessary for it to satisfy the needs of machine learning development. Exploratory data analysis (EDA) aims to identify the relationships existing among the elements (features) of the dataset along with the features that exert the greatest influence on the Target (the value being predicted). Since potential correlations may be obscured within the data, and because machine learning algorithms expect data in particular formats (e.g., all numerical values), some form of feature engineering may be necessary to reveal the full predictive power in the data, and to make that data satisfy machine learning requirements.
What you’ll learn
- Data Profiling
- Data Preparation
- Exploratory Data Analysis (EDA)
- Feature Engineering
Topic Kickoff
Resources | Links |
---|---|
Recording | Recording |
Presentation | Presentation |
Table of Contents
Resources | Links |
---|---|
Online Book: (oreilly.com) | Python Feature Engineering Cookbook |
Tutorial: Jupyter Notebook | Data Profiling |
Tutorial: Jupyter Notebook | Data Preparation |
Tutorial: Jupyter Notebook | Exploratory Data Analysis (EDA) |
Tutorial: Jupyter Notebook | Feature Engineering |
Data for Tutorials | Data |
Additional / Optional Resources
- MOOC: Analyzing and Visualizing Data with Power BI
- Practice: Data Preprocessing Challenge
- Applied Machine Learning: Feature Engineering (2 hours 26 minutes)