Analyzing GBIF (Global Biodiversity Information Facility) Data

This project analyzed Species at Risk observations within Canadian national parks using occurrence records from the Global Biodiversity Information Facility (GBIF). Using Python geospatial tools, Northern Leopard Frog observations were retrieved, cleaned, and spatially filtered to identify records occurring within park boundaries. The analysis explored temporal trends in species observations and spatial conservation patterns.

Project Workflow

Step 01

Retrieve Species Occurrence Data from GBIF

Species occurrence records for the Northern Leopard Frog (Lithobates pipiens) were retrieved from the Global Biodiversity Information Facility (GBIF) using the pygbif API. Records from two time periods (2014–2015 and 2024–2025) were collected for comparison and converted into a tabular dataset for further analysis.

from pygbif import occurrences

results = occurrences.search(
scientificName="Lithobates pipiens",
year="2024,2025",
hasCoordinate=True,
limit=300
)

Step 02

Convert Occurrence Data to Spatial Format

The occurrence records were converted into a GeoDataFrame using latitude and longitude coordinates, enabling spatial analysis within Python. This step prepares the data for mapping and spatial operations with other geospatial datasets.

import geopandas as gpd
from shapely.geometry import Point

geometry = [Point(xy) for xy in zip(df["decimalLongitude"], df["decimalLatitude"])]

gdf = gpd.GeoDataFrame(df, geometry=geometry, crs="EPSG:4326")

Step 03

Identify Observations Within National Parks

National park boundaries were loaded as a spatial dataset and a spatial join was used to identify frog observations located within park boundaries. This allowed park-level summaries and conservation-focused spatial analysis.

gdf_in_parks = gpd.sjoin(
gdf,
parks_gdf,
how="inner",
predicate="within"
)

Step 04

Explore Spatial Patterns in Observations

Spatial analysis techniques were applied to explore patterns in frog observations. Kernel density estimation was used to identify clusters of observations and assess potential hotspots within protected areas.

from sklearn.neighbors import KernelDensity

kde = KernelDensity(bandwidth=500)
kde.fit(coords)

densities = np.exp(kde.score_samples(coords))

Project Visuals