Derived data products from GBIF snapshots.
Processed GBIF occurrence data hexed at H3 resolutions 0-10, plus derived products.
GBIF releases new snapshots periodically. Releases are versioned by year-month (e.g., 2026-06) so multiple versions can coexist in the bucket during transitions. Always use the most recent release for new work.
Path: s3://public-gbif/2026-06/hex/
Global GBIF occurrences (GBIF 2026-06-01 snapshot, 3,499,090,951 records) partitioned by H3 resolution-0 cell — 122 h0 partitions, one data_0.parquet file each. One row per occurrence; deduplicated by gbifid.
gbifid, datasetkey, occurrenceid, kingdom, phylum, class, order, family, genus, species, infraspecificepithet, taxonrank, scientificname, verbatimscientificname, verbatimscientificnameauthorshipcountrycode, locality, stateprovince, occurrencestatus, individualcount, decimallatitude, decimallongitude, coordinateuncertaintyinmeters, coordinateprecision, elevation, depth, eventdate, day, month, year, taxonkey, specieskey, basisofrecord, institutioncode, collectioncode, catalognumber, recordnumber, identifiedby, dateidentified, license, rightsholder, recordedby, typestatus, establishmentmeans, lastinterpreted, mediatype, issueh0 (INT64/UBIGINT, partition key), h1–h10 (UBIGINT). Native resolution is h10 (one ~15 m² cell per point); – are parent rollups.The hex is partitioned by h0 (geography), not species, so a bare WHERE specieskey=… scans all 122 files. The tiny species-h0-index.parquet sidecar ((specieskey, species, h0), ~22 MB) scopes the species' partitions in one query — DuckDB turns the subquery into a dynamic partition filter and reads only those h0 files (median 1/122; ~2s):
species is NULL for genus-or-coarser records and excluded from the sidecar (it keys on specieskey IS NOT NULL).
s3://public-gbif/hex/ — original VARCHAR-keyed hex, deprecated.s3://public-gbif/2025-06/hex/ — superseded by 2026-06; retained temporarily for backward compatibility, scheduled for removal once dependent apps repoint.Prefix: s3://public-gbif/2026-06/taxonomy/ (partitioned by h0)
Aggregated counts of taxa within H3 resolution-0 hexagons.
Schema: kingdom, phylum, class, order, family, genus, species, infraspecificepithet, taxonrank, scientificname, verbatimscientificname, verbatimscientificnameauthorship, n (count), h0
File: s3://public-gbif/redlined_cities_gbif.parquet
Spatial join of GBIF occurrences with "Mapping Inequality" (Redlining) polygons for US cities.
Schema: gbifid, scientificname, kingdom, phylum, class, order, family, genus, species, recordedby, date, coordinateuncertaintyinmeters, city, state, grade (A-D), residential, commercial, industrial
File: s3://public-gbif/taxa.parquet
Reference list of all taxa found in the dataset.
h0h9