Data Science at IPA

Hands-on tutorials for learning Python programming, data analysis, visualization, and web scraping. These self-paced resources help research and data staff at IPA build practical skills.

Data science at IPA encompasses the tools and techniques needed to collect, analyze, and visualize research data. This section provides hands-on tutorials for learning and applying data science to research workflows. These resources help research and data staff build practical data science skills through self-paced learning.

Learning Paths

This section organizes tutorials into three complementary learning paths. Each path focuses on a specific aspect of the data science workflow, from collecting data from the web to analyzing it and creating compelling visualizations. The tutorials build progressively, starting with fundamental concepts and advancing to more sophisticated techniques.

These resources emphasize learning by doing. Each tutorial includes practical examples and exercises relevant to research and development work. The Python Data Analysis series provides a comprehensive introduction to programming, while the Web Scraping and Data Visualization tutorials offer focused skill-building in specialized areas.

Web Scraping

Web scraping enables automated data collection from websites, making it possible to gather research data that may not be available through traditional channels. There are many ways to approach web scraping from the basic to the complex. This tutorial series introduces gazpacho, a lightweight Python library that simplifies the process of extracting data from web pages.

These tutorials guide you through the fundamentals of web scraping, from making HTTP requests to parsing HTML and integrating scraped data with pandas for analysis. The series emphasizes ethical web scraping practices and provides practical examples for common research scenarios.

Python Data Analysis

This comprehensive tutorial series introduces Python programming through practical data analysis. Adapted from The Carpentries, these lessons provide a thorough foundation in Python fundamentals while working with real-world datasets.

The tutorials progress from basic Python concepts to working with pandas DataFrames, the standard tool for tabular data analysis in Python. This series is ideal for those with little or no previous programming experience who want to learn Python for research data analysis.

Data Visualization

Creating effective visualizations is essential for communicating research findings. This tutorial series focuses on seaborn, a powerful Python library for creating publication-ready figures, using the modern seaborn.objects interface based on the grammar of graphics.

These tutorials teach you to build visualizations step by step using a principled, declarative approach. The series emphasizes creating clear, compelling figures that communicate research results to diverse audiences.

Getting Started

If you are new to Python, start with the Python Data Analysis series to build foundational programming skills for working with data. After you become comfortable with Python basics and pandas, explore the Web Scraping tutorials to learn data collection techniques, or the Data Visualization series to create compelling figures for presentations and publications.