Data Science at IPA
Hands-on tutorials for learning Python programming, data analysis, visualization, and web scraping. These self-paced resources help research and data staff at IPA build practical skills.
Data science at IPA encompasses the tools and techniques needed to collect, analyze, and visualize research data. This section provides hands-on tutorials for learning and applying data science to research workflows. These resources help research and data staff build practical data science skills through self-paced learning.
Learning Paths
This section organizes tutorials into three complementary learning paths. Each path focuses on a specific aspect of the data science workflow, from collecting data from the web to analyzing it and creating compelling visualizations. The tutorials build progressively, starting with fundamental concepts and advancing to more sophisticated techniques.
These resources emphasize learning by doing. Each tutorial includes practical examples and exercises relevant to research and development work. The Python Data Analysis series provides a comprehensive introduction to programming, while the Web Scraping and Data Visualization tutorials offer focused skill-building in specialized areas.
Web scraping enables automated data collection from websites, making it possible to gather research data that may not be available through traditional channels. There are many ways to approach web scraping from the basic to the complex. This tutorial series introduces gazpacho, a lightweight Python library that simplifies the process of extracting data from web pages.
- Introduction to Web Scraping
- Getting Started with Gazpacho
- Making HTTP Requests
- Parsing HTML with Soup
- Integration with Pandas
- Advanced Selection
These tutorials guide you through the fundamentals of web scraping, from making HTTP requests to parsing HTML and integrating scraped data with pandas for analysis. The series emphasizes ethical web scraping practices and provides practical examples for common research scenarios.
This comprehensive tutorial series introduces Python programming through practical data analysis. Adapted from The Carpentries, these lessons provide a thorough foundation in Python fundamentals while working with real-world datasets.
- Introduction to Python
- Running and Quitting
- Variables and Assignment
- Data Types and Conversion
- Built-in Functions
- Libraries
- Lists
- For Loops
- Conditionals
- Looping Over Data Sets
- Writing Functions
- Variable Scope
- Programming Style
- Reading Tabular Data
- Pandas DataFrames
- Plotting
The tutorials progress from basic Python concepts to working with pandas DataFrames, the standard tool for tabular data analysis in Python. This series is ideal for those with little or no previous programming experience who want to learn Python for research data analysis.
Creating effective visualizations is essential for communicating research findings. This tutorial series focuses on seaborn, a powerful Python library for creating publication-ready figures, using the modern seaborn.objects interface based on the grammar of graphics.
- Introduction to Visualization
- Introduction to Seaborn
- Grammar of Graphics
- Marks and Geometric Objects
- Faceting and Layering
- Labels and Customization
- Statistical Transformations
- Themes and Final Polish
These tutorials teach you to build visualizations step by step using a principled, declarative approach. The series emphasizes creating clear, compelling figures that communicate research results to diverse audiences.
Getting Started
If you are new to Python, start with the Python Data Analysis series to build foundational programming skills for working with data. After you become comfortable with Python basics and pandas, explore the Web Scraping tutorials to learn data collection techniques, or the Data Visualization series to create compelling figures for presentations and publications.