Data Analysis with Python
Exercise Files:
tinyurl.com/53yy6xfj
Jupyterlab Install:
jupyter.org/install
Seaborn:
seaborn.pydata.org/
Pandas:
pandas.pydata.org/
Who it's for: Who it is for: This training is for learners who want to turn raw datasets into actionable insights using Python—especially those who already have a basic Python foundation (variables, data types, control flow, plus lists/dictionaries/functions) and are ready to apply common data-analysis libraries like NumPy, pandas, and seaborn to real-world data cleaning, exploration, and visualization workflows. It’s particularly relevant for people who want to use data analysis to diagnose issues, spot trends, and make better operational or business decisions (e.g., improving performance over time, increasing efficiency, and supporting profitability).
What it is: Python is the programming language you use to do data analysis efficiently—especially when you combine it with libraries like pandas (for working with datasets) and visualization tools like seaborn to manipulate, explore, and interpret data to discover real-world insights. In this course, you run that Python work inside JupyterLab, a web-based environment (the next iteration of Jupyter Notebook) that lets you work with notebooks alongside terminals and text editors, keep multiple projects organized in one place, and manage your analysis in a “notebook with multiple pages” style workflow. It runs through a Python kernel on a local server you launch (often after installing via pip install jupyterlab), and it includes productivity features like IPython “magic commands” (single-% for line magics and double-%% for cell magics) to help you navigate your workspace and measure or capture code
What you'll learn: You will learn a practical, end-to-end data analysis lifecycle in Python: importing data from CSV, JSON, Stata files, and SQLite databases; cleaning data by selecting relevant fields, validating missing values, and handling outliers; engineering features with apply and lambda functions; combining datasets with merges/joins; reshaping data from wide to long format; summarizing data with grouping/aggregation and pivot tables; working with time series data through resampling, reindexing, rolling averages, and running totals; and building and evaluating basic predictive models using correlation and scikit-learn linear and multiple regression, including handling categorical variables with encoding.