Introduction to Data Visualization with Python
Learn to create compelling visualizations for research using Python’s seaborn library. Master the modern seaborn.objects interface through practical examples relevant to policy and development research. Perfect for research associates and managers working with data.
Workshop Overview
This workshop introduces data visualization using Python’s seaborn library, focusing on the modern seaborn.objects interface. Designed for research associates and managers working on policy and development projects, the course emphasizes practical skills for creating clear, compelling visualizations that communicate research findings effectively.
Why Seaborn.Objects?
The seaborn.objects interface represents a modern approach to data visualization based on the grammar of graphics - the same principled framework behind R’s popular ggplot2. This approach:
- Makes complex visualizations easier to build step-by-step
- Provides intuitive, declarative syntax
- Encourages thinking about visualization components rather than chart types
- Integrates seamlessly with pandas DataFrames
Who This Workshop Is For
- Research Associates collecting and analyzing data
- Research Managers overseeing projects and reviewing findings
- Anyone who wants to create better visualizations for:
- Research reports
- Policy briefs
- Stakeholder presentations
- Academic papers
- Data exploration
Prerequisites
Basic Python knowledge: Variables, functions, importing libraries
Familiarity with pandas: Reading data, DataFrames (helpful but not required)
Python environment with these packages installed:
uv pip install pandas seaborn matplotlib
See setup instructions for detailed installation guidance.
Learning Objectives
By the end of this workshop, you will be able to:
- Understand and apply the grammar of graphics framework
- Create a variety of visualizations using seaborn.objects
- Map data variables to visual properties (position, color, size, shape)
- Choose appropriate visualization types for different research questions
- Customize plots with labels, scales, and themes
- Create multi-panel figures with faceting
- Add statistical summaries and regression lines
- Produce publication-ready figures for reports and presentations
Workshop Structure
This workshop consists of seven hands-on lessons:
Lesson 1: Introduction to Seaborn
Duration: 45-60 minutes
Learn why data visualization matters in research, get introduced to seaborn and its objects interface, and create your first visualization. You’ll understand the basic structure of a seaborn.objects plot and practice with the Palmer Penguins dataset.
Key concepts: seaborn.objects, so.Plot(), .add(), basic scatter plots
Lesson 2: The Grammar of Graphics
Duration: 60-75 minutes
Dive deep into the grammar of graphics framework. Learn how to map data variables to different visual properties like color, size, and shape. Create multi-dimensional visualizations that show multiple variables simultaneously.
Key concepts: Aesthetic mappings, color scales, size, shape, transparency, choosing appropriate mappings
Lesson 3: Marks and Geometric Objects
Duration: 60-75 minutes
Explore different types of marks (geometric objects) including dots, lines, bars, areas, and bands. Learn when to use each type and how to combine multiple marks in layered visualizations.
Key concepts: so.Dot(), so.Line(), so.Bar(), so.Area(), so.Band(), layering marks
Lesson 4: Labels, Scales, and Customization
Duration: 45-60 minutes
Make your plots clear and professional with proper labels, titles, and legends. Control scales, axes limits, and color palettes. Learn accessibility best practices including colorblind-friendly design.
Key concepts: .label(), .scale(), axis limits, color palettes, accessibility
Lesson 5: Faceting and Layering
Duration: 60-75 minutes
Create small multiples (faceted plots) to compare across categories. Master the art of layering multiple visualization types to build rich, comprehensive displays. Learn when to use faceting versus color encoding.
Key concepts: .facet(), small multiples, combining faceting and layering, multi-panel figures
Lesson 6: Statistical Transformations
Duration: 60-75 minutes
Add statistical summaries directly to your visualizations. Create aggregations, confidence intervals, regression lines, and histograms. Learn how to combine raw data with statistical summaries for complete data stories.
Key concepts: so.Agg(), so.Est(), so.PolyFit(), so.Hist(), confidence intervals, regression
Lesson 7: Themes and Final Polish
Duration: 45-60 minutes
Apply professional themes and fine-tune every aspect of your visualizations. Learn how to save publication-quality figures at appropriate resolutions. Follow best practices for different output formats (papers, presentations, posters).
Key concepts: sns.set_theme(), .theme(), saving figures, DPI, best practices
Total Workshop Duration
- Minimum: 6 hours (core content only)
- Recommended: 8-10 hours (with exercises and discussions)
- Format suggestions:
- 2-day intensive: 4-5 hours per day
- Weekly series: 1.5-2 hours per week for 5-7 weeks
- Self-paced: Work through at your own speed
Workshop Philosophy
Learn by Doing
Each lesson includes:
- Hands-on code examples you can run immediately
- Exercises with solutions to test your understanding
- Real research scenarios relevant to policy and development work
Research-Focused Examples
While we introduce concepts with standard datasets (like Palmer Penguins), we emphasize examples relevant to research and policy work:
- Program impact evaluations
- Household surveys
- Multi-site comparisons
- Time series of development indicators
- Educational interventions
Progressive Complexity
Lessons build on each other:
- Start with simple scatter plots
- Gradually add layers of complexity
- End with publication-ready multi-panel figures
- Each step adds one new concept
Best Practices Throughout
Learn not just how to create visualizations, but how to create good visualizations:
- Accessibility (colorblind-friendly palettes)
- Clear labeling and documentation
- Appropriate statistical summaries
- Professional styling
- Honest, ethical representation of data
What You’ll Create
By the end of this workshop, you’ll be able to create visualizations like:
Exploratory scatter plots showing relationships between variables with color and size encoding additional dimensions
Impact evaluation figures with treatment and control groups, confidence intervals, and multiple time points
Multi-panel comparisons showing outcomes across different sites, regions, or demographic groups
Distribution analyses with histograms and density plots comparing multiple categories
Regression visualizations showing relationships with fitted lines and confidence bands
Publication-ready figures with professional styling suitable for academic papers, policy briefs, or presentations
Beyond This Workshop
Continue Learning
- Seaborn Documentation: seaborn.pydata.org
- Seaborn Objects Guide: seaborn.pydata.org/tutorial/objects_interface.html
- Python Graph Gallery: python-graph-gallery.com
- Data Visualization Books:
- “Fundamentals of Data Visualization” by Claus O. Wilke
- “The Visual Display of Quantitative Information” by Edward Tufte
Practice with Your Data
The best way to master visualization is to:
- Apply these techniques to your own research data
- Recreate visualizations you see in papers you admire
- Get feedback from colleagues and stakeholders
- Iterate and refine based on what communicates best
Getting Help
If you encounter issues:
- Check the documentation: Seaborn has excellent documentation with many examples
- Read error messages carefully: They often point to the solution
- Search online: Stack Overflow has many seaborn questions answered
- Ask colleagues: Learning together is more effective and fun
Data Sources for Practice
Throughout this workshop, we use:
- Palmer Penguins: Built into seaborn, great for learning
- Tips: Built into seaborn, good for categorical analysis
- Simulated research data: Created to mirror real research scenarios
For your own practice, consider:
- Your current research project data
- Publicly available datasets (World Bank, Gapminder, etc.)
- Government statistics from Kenya National Bureau of Statistics
- International development indicators
Acknowledgments
This workshop draws inspiration from:
- The Carpentries workshops on Python and data visualization
- Hadley Wickham’s work on the grammar of graphics
- The seaborn development team for creating an excellent library
- Research teams worldwide doing important policy and development work
Let’s Begin
Ready to start creating beautiful, informative visualizations?
Begin with Lesson 1: Introduction to Seaborn
License
This workshop is licensed under CC BY 4.0. You are free to:
- Share — copy and redistribute the material
- Adapt — remix, transform, and build upon the material
Under the following terms:
- Attribution — You must give appropriate credit
Good luck with your data visualization journey! Remember: every expert was once a beginner. Take your time, practice regularly, and don’t be afraid to experiment.