Introduction to Data Visualization with Python

Learn to create compelling visualizations for research using Python’s seaborn library. Master the modern seaborn.objects interface through practical examples relevant to policy and development research. Perfect for research associates and managers working with data.

Workshop Overview

This workshop introduces data visualization using Python’s seaborn library, focusing on the modern seaborn.objects interface. Designed for research associates and managers working on policy and development projects, the course emphasizes practical skills for creating clear, compelling visualizations that communicate research findings effectively.

Why Seaborn.Objects?

The seaborn.objects interface represents a modern approach to data visualization based on the grammar of graphics - the same principled framework behind R’s popular ggplot2. This approach:

  • Makes complex visualizations easier to build step-by-step
  • Provides intuitive, declarative syntax
  • Encourages thinking about visualization components rather than chart types
  • Integrates seamlessly with pandas DataFrames

Who This Workshop Is For

  • Research Associates collecting and analyzing data
  • Research Managers overseeing projects and reviewing findings
  • Anyone who wants to create better visualizations for:
    • Research reports
    • Policy briefs
    • Stakeholder presentations
    • Academic papers
    • Data exploration

Prerequisites

  1. Basic Python knowledge: Variables, functions, importing libraries

  2. Familiarity with pandas: Reading data, DataFrames (helpful but not required)

  3. Python environment with these packages installed:

    uv pip install pandas seaborn matplotlib

See setup instructions for detailed installation guidance.

Learning Objectives

By the end of this workshop, you will be able to:

  • Understand and apply the grammar of graphics framework
  • Create a variety of visualizations using seaborn.objects
  • Map data variables to visual properties (position, color, size, shape)
  • Choose appropriate visualization types for different research questions
  • Customize plots with labels, scales, and themes
  • Create multi-panel figures with faceting
  • Add statistical summaries and regression lines
  • Produce publication-ready figures for reports and presentations

Workshop Structure

This workshop consists of seven hands-on lessons:

Lesson 1: Introduction to Seaborn

Duration: 45-60 minutes

Learn why data visualization matters in research, get introduced to seaborn and its objects interface, and create your first visualization. You’ll understand the basic structure of a seaborn.objects plot and practice with the Palmer Penguins dataset.

Key concepts: seaborn.objects, so.Plot(), .add(), basic scatter plots

Lesson 2: The Grammar of Graphics

Duration: 60-75 minutes

Dive deep into the grammar of graphics framework. Learn how to map data variables to different visual properties like color, size, and shape. Create multi-dimensional visualizations that show multiple variables simultaneously.

Key concepts: Aesthetic mappings, color scales, size, shape, transparency, choosing appropriate mappings

Lesson 3: Marks and Geometric Objects

Duration: 60-75 minutes

Explore different types of marks (geometric objects) including dots, lines, bars, areas, and bands. Learn when to use each type and how to combine multiple marks in layered visualizations.

Key concepts: so.Dot(), so.Line(), so.Bar(), so.Area(), so.Band(), layering marks

Lesson 4: Labels, Scales, and Customization

Duration: 45-60 minutes

Make your plots clear and professional with proper labels, titles, and legends. Control scales, axes limits, and color palettes. Learn accessibility best practices including colorblind-friendly design.

Key concepts: .label(), .scale(), axis limits, color palettes, accessibility

Lesson 5: Faceting and Layering

Duration: 60-75 minutes

Create small multiples (faceted plots) to compare across categories. Master the art of layering multiple visualization types to build rich, comprehensive displays. Learn when to use faceting versus color encoding.

Key concepts: .facet(), small multiples, combining faceting and layering, multi-panel figures

Lesson 6: Statistical Transformations

Duration: 60-75 minutes

Add statistical summaries directly to your visualizations. Create aggregations, confidence intervals, regression lines, and histograms. Learn how to combine raw data with statistical summaries for complete data stories.

Key concepts: so.Agg(), so.Est(), so.PolyFit(), so.Hist(), confidence intervals, regression

Lesson 7: Themes and Final Polish

Duration: 45-60 minutes

Apply professional themes and fine-tune every aspect of your visualizations. Learn how to save publication-quality figures at appropriate resolutions. Follow best practices for different output formats (papers, presentations, posters).

Key concepts: sns.set_theme(), .theme(), saving figures, DPI, best practices

Total Workshop Duration

  • Minimum: 6 hours (core content only)
  • Recommended: 8-10 hours (with exercises and discussions)
  • Format suggestions:
    • 2-day intensive: 4-5 hours per day
    • Weekly series: 1.5-2 hours per week for 5-7 weeks
    • Self-paced: Work through at your own speed

Workshop Philosophy

Learn by Doing

Each lesson includes:

  • Hands-on code examples you can run immediately
  • Exercises with solutions to test your understanding
  • Real research scenarios relevant to policy and development work

Research-Focused Examples

While we introduce concepts with standard datasets (like Palmer Penguins), we emphasize examples relevant to research and policy work:

  • Program impact evaluations
  • Household surveys
  • Multi-site comparisons
  • Time series of development indicators
  • Educational interventions

Progressive Complexity

Lessons build on each other:

  • Start with simple scatter plots
  • Gradually add layers of complexity
  • End with publication-ready multi-panel figures
  • Each step adds one new concept

Best Practices Throughout

Learn not just how to create visualizations, but how to create good visualizations:

  • Accessibility (colorblind-friendly palettes)
  • Clear labeling and documentation
  • Appropriate statistical summaries
  • Professional styling
  • Honest, ethical representation of data

What You’ll Create

By the end of this workshop, you’ll be able to create visualizations like:

  1. Exploratory scatter plots showing relationships between variables with color and size encoding additional dimensions

  2. Impact evaluation figures with treatment and control groups, confidence intervals, and multiple time points

  3. Multi-panel comparisons showing outcomes across different sites, regions, or demographic groups

  4. Distribution analyses with histograms and density plots comparing multiple categories

  5. Regression visualizations showing relationships with fitted lines and confidence bands

  6. Publication-ready figures with professional styling suitable for academic papers, policy briefs, or presentations

Beyond This Workshop

Continue Learning

Practice with Your Data

The best way to master visualization is to:

  1. Apply these techniques to your own research data
  2. Recreate visualizations you see in papers you admire
  3. Get feedback from colleagues and stakeholders
  4. Iterate and refine based on what communicates best

Share and Get Feedback

  • Present your visualizations to colleagues
  • Ask for feedback: “Is the message clear?”
  • Iterate based on audience response
  • Build a portfolio of your best work

Getting Help

If you encounter issues:

  1. Check the documentation: Seaborn has excellent documentation with many examples
  2. Read error messages carefully: They often point to the solution
  3. Search online: Stack Overflow has many seaborn questions answered
  4. Ask colleagues: Learning together is more effective and fun

Data Sources for Practice

Throughout this workshop, we use:

  • Palmer Penguins: Built into seaborn, great for learning
  • Tips: Built into seaborn, good for categorical analysis
  • Simulated research data: Created to mirror real research scenarios

For your own practice, consider:

  • Your current research project data
  • Publicly available datasets (World Bank, Gapminder, etc.)
  • Government statistics from Kenya National Bureau of Statistics
  • International development indicators

Acknowledgments

This workshop draws inspiration from:

  • The Carpentries workshops on Python and data visualization
  • Hadley Wickham’s work on the grammar of graphics
  • The seaborn development team for creating an excellent library
  • Research teams worldwide doing important policy and development work

Let’s Begin

Ready to start creating beautiful, informative visualizations?

Begin with Lesson 1: Introduction to Seaborn


License

This workshop is licensed under CC BY 4.0. You are free to:

  • Share — copy and redistribute the material
  • Adapt — remix, transform, and build upon the material

Under the following terms:

  • Attribution — You must give appropriate credit

Good luck with your data visualization journey! Remember: every expert was once a beginner. Take your time, practice regularly, and don’t be afraid to experiment.

Back to top