Marks and Geometric Objects

Learn about different types of marks (geometric objects) in seaborn.objects. Create scatter plots, line plots, bar plots, and more. Choose appropriate marks for different data types and research questions.

NoteLearning Objectives
  • Understand different types of marks (geometric objects) available in seaborn.objects
  • Create visualizations using Dot, Line, Bar, Area, and Band marks
  • Choose appropriate marks for different data types and research questions
  • Combine multiple marks in a single plot
  • Understand when to use each type of visualization
TipKey Questions
  • What types of marks are available in seaborn.objects?
  • How do I choose the right mark for my data?
  • When should I use dots vs. lines vs. bars?
  • How can I combine multiple marks in one plot?

Understanding Marks

In the grammar of graphics, marks (also called geometric objects or “geoms”) are the visual elements that represent data points. Each mark type is designed to show different aspects of data:

  • Dots (so.Dot) - Individual data points
  • Lines (so.Line) - Connections and trends over continuous data
  • Bars (so.Bar) - Comparisons between categories or distributions
  • Area (so.Area) - Cumulative values or filled regions
  • Band (so.Band) - Ranges or confidence intervals
  • Dash (so.Dash) - Range plots or error bars
  • Paths (so.Path) - Connected points in order of appearance

Let’s explore each of these!

Setting Up

import seaborn as sns
import seaborn.objects as so
import pandas as pd
import numpy as np

# Load datasets
penguins = sns.load_dataset("penguins").dropna()
tips = sns.load_dataset("tips")

Dots: Scatter Plots

We’ve already seen so.Dot() - it’s perfect for showing:

  • Relationships between two continuous variables
  • Individual observations
  • Distributions of data points
# Basic scatter plot
(
    so.Plot(
        penguins,
        x="flipper_length_mm",
        y="body_mass_g",
        color="species"
    )
    .add(so.Dot())
)

When to Use Dots

  • Exploring relationships between continuous variables
  • Showing individual data points when sample size is moderate (< 1000)
  • Comparing groups when you want to see all the data
  • Checking for outliers or unusual patterns

Bars: Comparing Categories

so.Bar() creates bar plots, excellent for:

  • Comparing categories
  • Showing counts or frequencies
  • Displaying aggregated values
# Average body mass by species
(
    so.Plot(penguins, x="species", y="body_mass_g", color="species")
    .add(so.Bar())
)

Note: By default, so.Bar() aggregates data (usually by taking the mean).

Grouped Bar Charts

# Compare by both species and sex
penguins_complete = penguins.dropna(subset=['sex'])

(
    so.Plot(penguins_complete, x="species", y="body_mass_g", color="sex")
    .add(so.Bar())
)

Horizontal Bars

Sometimes horizontal bars are clearer, especially with long category names:

# Create survey response data
survey_data = pd.DataFrame({
    'response': [
        'Strongly Agree',
        'Agree',
        'Neutral',
        'Disagree',
        'Strongly Disagree'
    ],
    'count': [45, 78, 23, 12, 5]
})

(
    so.Plot(survey_data, x="count", y="response")
    .add(so.Bar())
)
NoteWhen to Use Bars
  • Categorical comparisons: Comparing values across categories
  • Survey responses: Showing frequency or percentages
  • Rankings: Displaying ordered categories
  • Part-to-whole: When values sum to a meaningful total

Avoid bars when:

  • Showing precise individual values (use dots)
  • You have many categories (>10) - hard to read
  • Showing distributions (use histograms or density plots)

Area: Filled Regions

so.Area() creates filled areas under a line:

# Cumulative enrollment over time
cumulative_data = enrollment_data.copy()
cumulative_data['cumulative_enrolled'] = cumulative_data['enrolled'].cumsum()

(
    so.Plot(cumulative_data, x="month", y="cumulative_enrolled")
    .add(so.Area())
)

Stacked Areas

Great for showing composition over time:

# Budget allocation over time
budget_data = pd.DataFrame({
    'year': list(range(2018, 2024)) * 3,
    'amount': [
        30, 32, 35, 38, 40, 42,  # Research
        20, 22, 25, 28, 30, 33,  # Training
        15, 18, 20, 22, 25, 28   # Admin
    ],
    'category': ['Research'] * 6 + ['Training'] * 6 + ['Admin'] * 6
})

(
    so.Plot(budget_data, x="year", y="amount", color="category")
    .add(so.Area())
)
NoteWhen to Use Area
  • Cumulative values: Showing accumulation over time
  • Composition: Parts of a whole over time (stacked areas)
  • Emphasis: Drawing attention to the magnitude of change

Avoid area when:

  • You need to compare exact values (use lines or bars)
  • You have many overlapping areas (hard to read)

Band: Showing Ranges

so.Band() shows ranges or confidence intervals:

# Create data with error ranges
# Imagine tracking income with confidence intervals
income_data = pd.DataFrame({
    'year': range(2018, 2024),
    'mean_income': [450, 480, 520, 550, 590, 630],
    'lower_ci': [420, 445, 485, 510, 545, 580],
    'upper_ci': [480, 515, 555, 590, 635, 680]
})

(
    so.Plot(income_data, x="year")
    .add(so.Band(alpha=0.3), ymin="lower_ci", ymax="upper_ci")
    .add(so.Line(), y="mean_income")
)

This shows the mean income as a line with a confidence band around it.

NoteWhen to Use Band
  • Uncertainty: Showing confidence intervals or standard errors
  • Ranges: Min-max ranges over time
  • Predictions: Forecast intervals

Combining Multiple Marks

One of the most powerful features is combining marks in layers:

# Research example: Survey responses over time with confidence
survey_trend = pd.DataFrame({
    'round': [1, 2, 3, 4, 5],
    'satisfaction': [3.2, 3.5, 3.8, 4.1, 4.3],
    'lower': [2.9, 3.2, 3.5, 3.8, 4.0],
    'upper': [3.5, 3.8, 4.1, 4.4, 4.6]
})

(
    so.Plot(survey_trend, x="round")
    .add(so.Band(alpha=0.2), ymin="lower", ymax="upper")
    .add(so.Line(linewidth=2), y="satisfaction")
    .add(so.Dot(pointsize=8), y="satisfaction")
)

This creates a rich visualization showing:

  • The confidence range (band)
  • The trend line
  • The actual data points

Real Research Example: Impact Evaluation

Let’s create a more complete research visualization showing program impact over time:

# Simulate treatment and control group outcomes
np.random.seed(42)
periods = 6
treatment_effect = np.array([0, 0, 5, 8, 12, 15])  # Effect starts after period 2

impact_data = pd.DataFrame({
    'period': list(range(periods)) * 2,
    'outcome': (
        [50, 52, 55, 58, 62, 65] +  # Control group
        [50, 52, 60, 66, 74, 80]    # Treatment group (with effect)
    ),
    'group': ['Control'] * periods + ['Treatment'] * periods
})

# Create visualization
(
    so.Plot(impact_data, x="period", y="outcome", color="group")
    .add(so.Line(linewidth=2))
    .add(so.Dot(pointsize=8))
)

Exercises

NoteExercise 1: Choosing the Right Mark

For each scenario, identify which mark type would be most appropriate:

  1. Showing monthly rainfall data for 12 months
  2. Comparing average test scores across 5 schools
  3. Displaying the relationship between age and income for 200 individuals
  4. Showing budget allocation across departments for a single year
  5. Tracking population growth with uncertainty bounds
  1. Line - time series data showing trend over time
  2. Bar - comparing categories (schools)
  3. Dot - relationship between two continuous variables
  4. Bar - comparing categories (departments) at one time point
  5. Line + Band - trend over time with uncertainty
NoteExercise 2: Create a Multi-Layer Plot

Using the penguins dataset, create a plot that shows:

  • The average flipper length by species (use bars)
  • Individual data points overlaid (use dots with low alpha)
# Your code here
(
    so.Plot(penguins, x="species", y="flipper_length_mm", color="species")
    .add(so.Bar(alpha=0.6))
    .add(so.Dot(alpha=0.2), so.Jitter(0.2))  # Jitter spreads dots horizontally
)

The combination of bars (showing the average) and dots (showing individual values) gives viewers both the summary and the underlying data distribution.

NoteExercise 3: Time Series Practice

Create sample data for a research scenario:

  • Track household savings for 12 months
  • Include data for 2 villages
  • Create a visualization that clearly shows the comparison
# Create the data structure
# Your code here

# Create the visualization
# Your code here
# Create data
months = pd.date_range('2023-01-01', periods=12, freq='M')
savings_data = pd.DataFrame({
    'month': list(months) * 2,
    'savings': (
        [5000, 5200, 5500, 5800, 6100, 6500, 6900, 7200, 7600, 8000, 8300, 8700] +  # Village A
        [4500, 4600, 4800, 5100, 5400, 5800, 6100, 6500, 6800, 7200, 7500, 7900]     # Village B
    ),
    'village': ['Village A'] * 12 + ['Village B'] * 12
})

# Visualize
(
    so.Plot(savings_data, x="month", y="savings", color="village")
    .add(so.Line(linewidth=2))
    .add(so.Dot(pointsize=6))
)

Decision Guide: Choosing Your Mark

Is your data categorical?
├─ Yes → Use Bar (for comparisons) or Dot (for individual values)
└─ No → Is it ordered/sequential?
    ├─ Yes (time series, ordered categories)
    │   ├─ Showing trend? → Use Line
    │   ├─ Showing accumulation? → Use Area
    │   └─ Showing range/uncertainty? → Use Band
    └─ No (unordered continuous data)
        └─ Use Dot (scatter plot)

Common Combinations

Some mark combinations work particularly well together:

Combination Use Case Example
Line + Dot Time series with data points Monthly metrics
Band + Line Trends with uncertainty Predictions with CI
Bar + Dot Summary + individual data Group comparisons
Area + Line Cumulative with rate Total enrollment + monthly change
ImportantKey Points
  • Different marks (geometric objects) serve different purposes
  • Dots show individual observations and relationships
  • Lines show trends and connections over ordered data
  • Bars compare categories and show aggregated values
  • Area shows filled regions and accumulation
  • Band displays ranges and uncertainty
  • Combine multiple marks in layers for richer visualizations
  • Choose marks based on your data type and research question
  • Always consider what story you’re trying to tell
TipLooking Ahead

In the next lesson, we’ll learn how to customize our plots with labels, titles, scales, and themes to create publication-ready visualizations that effectively communicate our findings to stakeholders.

Back to top