Introduction to Data Visualization with Python

Learn to create compelling visualizations for research using Python’s seaborn library. Use the modern seaborn.objects interface through practical examples relevant to policy and development research. Perfect for research associates and managers working with data.

Workshop Overview

This workshop introduces data visualization using Python’s seaborn library, focusing on the modern seaborn.objects interface. Designed for research associates and managers working on policy and development projects, the course emphasizes practical skills for creating clear, compelling visualizations that communicate research findings effectively.

Why Seaborn.Objects?

The seaborn.objects interface represents a modern approach to data visualization based on the grammar of graphics - the same principled framework behind R’s popular ggplot2. This approach:

Makes complex visualizations easier to build step-by-step
Provides intuitive, declarative syntax
Encourages thinking about visualization components rather than chart types
Integrates seamlessly with pandas DataFrames

Who This Workshop Is For

Research Associates collecting and analyzing data
Research Managers overseeing projects and reviewing findings
Anyone who wants to create better visualizations for:
- Research reports
- Policy briefs
- Stakeholder presentations
- Academic papers
- Data exploration

Prerequisites

Basic Python knowledge: Variables, functions, importing libraries
Familiarity with pandas: Reading data, DataFrames (helpful but not required)
Python environment with these packages installed:
```
uv pip install pandas seaborn matplotlib
```

Learning Objectives

By the end of this workshop, you will be able to:

Understand and apply the grammar of graphics framework
Create a variety of visualizations using seaborn.objects
Map data variables to visual properties (position, color, size, shape)
Choose appropriate visualization types for different research questions
Customize plots with labels, scales, and themes
Create multi-panel figures with faceting
Add statistical summaries and regression lines
Produce publication-ready figures for reports and presentations

Workshop Structure

This workshop consists of seven hands-on lessons:

Lesson 1: Introduction to Seaborn

Duration: 45-60 minutes

Learn why data visualization matters in research, get introduced to seaborn and its objects interface, and create your first visualization. You’ll understand the basic structure of a seaborn.objects plot and practice with the Palmer Penguins dataset.

Key concepts: seaborn.objects, so.Plot(), .add(), basic scatter plots

Lesson 2: The Grammar of Graphics

Duration: 60-75 minutes

Dive deep into the grammar of graphics framework. Learn how to map data variables to different visual properties like color, size, and shape. Create multi-dimensional visualizations that show multiple variables simultaneously.

Key concepts: Aesthetic mappings, color scales, size, shape, transparency, choosing appropriate mappings

Lesson 3: Marks and Geometric Objects

Duration: 60-75 minutes

Explore different types of marks (geometric objects) including dots, lines, bars, areas, and bands. Learn when to use each type and how to combine multiple marks in layered visualizations.

Key concepts: so.Dot(), so.Line(), so.Bar(), so.Area(), so.Band(), layering marks

Lesson 4: Labels, Scales, and Customization

Duration: 45-60 minutes

Make your plots clear and professional with proper labels, titles, and legends. Control scales, axes limits, and color palettes. Learn accessibility best practices including colorblind-friendly design.

Key concepts: .label(), .scale(), axis limits, color palettes, accessibility

Lesson 5: Faceting and Layering

Duration: 60-75 minutes

Create small multiples (faceted plots) to compare across categories. Learn how to layering multiple visualization types to build rich, comprehensive displays. Learn when to use faceting versus color encoding.

Key concepts: .facet(), small multiples, combining faceting and layering, multi-panel figures

Lesson 6: Statistical Transformations

Duration: 60-75 minutes

Add statistical summaries directly to your visualizations. Create aggregations, confidence intervals, regression lines, and histograms. Learn how to combine raw data with statistical summaries for complete data stories.

Key concepts: so.Agg(), so.Est(), so.PolyFit(), so.Hist(), confidence intervals, regression

Lesson 7: Themes and Final Polish

Duration: 45-60 minutes

Apply professional themes and fine-tune every aspect of your visualizations. Learn how to save publication-quality figures at appropriate resolutions. Follow best practices for different output formats (papers, presentations, posters).

Key concepts: sns.set_theme(), .theme(), saving figures, DPI, best practices

Total Workshop Duration

Minimum: 6 hours (core content only)
Recommended: 8-10 hours (with exercises and discussions)
Format suggestions:
- 2-day intensive: 4-5 hours per day
- Weekly series: 1.5-2 hours per week for 5-7 weeks
- Self-paced: Work through at your own speed

Workshop Philosophy

Learn by Doing

Each lesson includes:

Hands-on code examples you can run immediately
Exercises with solutions to test your understanding
Real research scenarios relevant to policy and development work

Research-Focused Examples

While we introduce concepts with standard datasets (like Palmer Penguins), we emphasize examples relevant to research and policy work:

Program impact evaluations
Household surveys
Multi-site comparisons
Time series of development indicators
Educational interventions

Progressive Complexity

Lessons build on each other:

Start with simple scatter plots
Gradually add layers of complexity
End with publication-ready multi-panel figures
Each step adds one new concept

Best Practices Throughout

Learn not just how to create visualizations, but how to create good visualizations:

Accessibility (colorblind-friendly palettes)
Clear labeling and documentation
Appropriate statistical summaries
Professional styling
Honest, ethical representation of data

What You’ll Create

By the end of this workshop, you’ll be able to create visualizations like:

Exploratory scatter plots showing relationships between variables with color and size encoding additional dimensions
Impact evaluation figures with treatment and control groups, confidence intervals, and multiple time points
Multi-panel comparisons showing outcomes across different sites, regions, or demographic groups
Distribution analyses with histograms and density plots comparing multiple categories
Regression visualizations showing relationships with fitted lines and confidence bands
Publication-ready figures with professional styling suitable for academic papers, policy briefs, or presentations

Beyond This Workshop

Continue Learning

Seaborn Documentation: seaborn.pydata.org
Seaborn Objects Guide: seaborn.pydata.org/tutorial/objects_interface.html
Python Graph Gallery: python-graph-gallery.com
Data Visualization Books:
- “Fundamentals of Data Visualization” by Claus O. Wilke
- “The Visual Display of Quantitative Information” by Edward Tufte

Practice with Your Data

The best way to learn and practice data visualization is to:

Apply these techniques to your own research data
Recreate visualizations you see in papers you admire
Get feedback from colleagues and stakeholders
Iterate and refine based on what communicates best

Getting Help

If you encounter issues:

Check the documentation: Seaborn has excellent documentation with many examples
Read error messages carefully: They often point to the solution
Search online: Stack Overflow has many seaborn questions answered
Ask colleagues: Learning together is more effective and fun

Data Sources for Practice

Throughout this workshop, we use:

Palmer Penguins: Built into seaborn, great for learning
Tips: Built into seaborn, good for categorical analysis
Simulated research data: Created to mirror real research scenarios

For your own practice, consider:

Your current research project data
Publicly available datasets (World Bank, Gapminder, etc.)
Government statistics from Kenya National Bureau of Statistics
International development indicators

Acknowledgments

This workshop draws inspiration from:

The Carpentries workshops on Python and data visualization
Hadley Wickham’s work on the grammar of graphics
The seaborn development team for creating an excellent library
Research teams worldwide doing important policy and development work

Let’s Begin

Ready to start creating beautiful, informative visualizations?

Begin with Lesson 1: Introduction to Seaborn

License

This workshop is licensed under CC BY 4.0. You are free to:

Share — copy and redistribute the material
Adapt — remix, transform, and build upon the material

Under the following terms:

Attribution — You must give appropriate credit

Good luck with your data visualization journey! Remember: every expert was once a beginner. Take your time, practice regularly, and don’t be afraid to experiment.

Workshop Overview

Why Seaborn.Objects?

Who This Workshop Is For

Prerequisites

Learning Objectives

Workshop Structure

Lesson 1: Introduction to Seaborn

Lesson 2: The Grammar of Graphics

Lesson 3: Marks and Geometric Objects

Lesson 4: Labels, Scales, and Customization

Lesson 5: Faceting and Layering

Lesson 6: Statistical Transformations

Lesson 7: Themes and Final Polish

Total Workshop Duration

Workshop Philosophy

Learn by Doing

Research-Focused Examples

Progressive Complexity

Best Practices Throughout

What You’ll Create

Beyond This Workshop

Continue Learning

Practice with Your Data

Share and Get Feedback

Getting Help

Data Sources for Practice

Acknowledgments

Let’s Begin

License