> uv add jupyterlab pandas matplotlib seaborn
Python
Python is a high-level, general-purpose programming language that is widely used in data science, machine learning, and web development. It has a large standard library and a vibrant community that provides a wide range of libraries and tools for various applications. As such, Python provides a general-purpose ecosystem that can be used for a wide range of applications.
How to install Python?
There are many ways to install Python. We recommend using Python in a virtual environment to avoid conflicts with other Python installations on your system.
We recommend using uv as a a simple way to create and manage Python virtual environments.
You can manage the python packages that are installed in the virtual environment using a pyproject.toml
file. See the pyproject.toml example in this repository for an example of how to manage Python packages. To add package dependencies to the virtual environment, using uv
, you can run:
First, install uv
using winget
(Windows) or brew
(MacOS/Linux):
# Install uv
winget install astral-sh.uv
# Install uv
brew install uv
# Install uv
brew install uv
Add libraries to the virtual environment using uv add ...
:
Coding Conventions
We highly recommend working with a virtual environment to manage Python dependencies. The pyproject.toml
is the preferred way to keep track of python dependencies as well as project-specific python conventions.
We recommend using Ruff to enforce linting and formatting rules. In most cases you can use the default linting and formatting rules provided by ruff
. However, you can customize the rules by modifying the [tool.ruff]
section of the pyproject.toml
file in the root of your project. for more about the configuration options, see the Ruff documentation.
If you are working in a virtual environment created in this repository, you automatically have access to Ruff
via just lint-py
and just fmt-python
commands to lint and format your code.
For more inspiration, see the GitLab Data Team’s Python Guide and Google’s Python Style Guide.
Example Usage
In the example below, we show how Python can be used to explore and visualize a dataset.
To follow along, you will need to work in a jupyter notebook with the right libraries installed in your environment. Don’t worry if you cannot do this now; we just want to show you what is possible here. We will revisit this example in Processing Data in Python.
Let’s load an example World Bank data via Gapminder using the causaldata package.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as sm
from causaldata import gapminder
Load the Gapminder data as a pandas DataFrame:
= gapminder.load_pandas().data df
We can check the dimensions of the DataFrame using df.info()
:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1704 entries, 0 to 1703
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 country 1704 non-null object
1 continent 1704 non-null object
2 year 1704 non-null int64
3 lifeExp 1704 non-null float64
4 pop 1704 non-null int64
5 gdpPercap 1704 non-null float64
dtypes: float64(2), int64(2), object(2)
memory usage: 80.0+ KB
Let’s take a look at the first few rows of the DataFrame using df.head()
:
df.head()
country | continent | year | lifeExp | pop | gdpPercap | |
---|---|---|---|---|---|---|
0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 |
1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 |
2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 |
3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 |
4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 |
Take a look at the relationship between GDP per Capita and Life Expectancy:
="gdpPercap", y="lifeExp", hue="continent", data=df).set(
sns.scatterplot(x="log", ylabel="Life Expectancy", xlabel="GDP per Capita"
xscale )
Separate the data by year, focusing on 1957 and 2007:
sns.relplot(=df.where(df["year"].isin([1957, 2007])),
data="gdpPercap",
x="lifeExp",
y="year",
col="continent",
hue=1,
col_wrap="scatter",
kind="muted",
paletteset(xscale="log", ylabel="Life Expectancy", xlabel="GDP per Capita") ).