Analyzing GBIF (Global Biodiversity Information Facility) Data
This project analyzed Species at Risk observations within Canadian national parks using occurrence records from the Global Biodiversity Information Facility (GBIF). Using Python geospatial tools, Northern Leopard Frog observations were retrieved, cleaned, and spatially filtered to identify records occurring within park boundaries. The analysis explored temporal trends in species observations and spatial conservation patterns.
Project Workflow
Step 01
Retrieve Species Occurrence Data from GBIF
Species occurrence records for the Northern Leopard Frog (Lithobates pipiens) were retrieved from the Global Biodiversity Information Facility (GBIF) using the pygbif API. Records from two time periods (2014–2015 and 2024–2025) were collected for comparison and converted into a tabular dataset for further analysis.
from pygbif import occurrences
results = occurrences.search(
scientificName="Lithobates pipiens",
year="2024,2025",
hasCoordinate=True,
limit=300
)
Step 02
Convert Occurrence Data to Spatial Format
The occurrence records were converted into a GeoDataFrame using latitude and longitude coordinates, enabling spatial analysis within Python. This step prepares the data for mapping and spatial operations with other geospatial datasets.
import geopandas as gpd
from shapely.geometry import Point
geometry = [Point(xy) for xy in zip(df["decimalLongitude"], df["decimalLatitude"])]
gdf = gpd.GeoDataFrame(df, geometry=geometry, crs="EPSG:4326")
Step 03
Identify Observations Within National Parks
National park boundaries were loaded as a spatial dataset and a spatial join was used to identify frog observations located within park boundaries. This allowed park-level summaries and conservation-focused spatial analysis.
gdf_in_parks = gpd.sjoin(
gdf,
parks_gdf,
how="inner",
predicate="within"
)
Step 04
Explore Spatial Patterns in Observations
Spatial analysis techniques were applied to explore patterns in frog observations. Kernel density estimation was used to identify clusters of observations and assess potential hotspots within protected areas.
from sklearn.neighbors import KernelDensity
kde = KernelDensity(bandwidth=500)
kde.fit(coords)
densities = np.exp(kde.score_samples(coords))