Retrofitting STAC: Making a Schema Crosswalk for STAC API

A snippet of a STAC Collection, part of the STAC specification code

Specifications may not be the most glamorous aspect of software, but they are essential. Just now, your web browser used the HTTP protocol specification to fetch this blog post written following the HTML specification using letters specified by the Unicode standard. These specifications allow you to view this blog post using Google Chrome, Mozilla Firefox, or even Lynx. The adoption of these open standards is fundamental to the open internet. Similarly, Kitware believes in openness, which is why we want to share the methods and best practices we used to adopt a new open specification, STAC (SpatioTemporal Asset Catalogs), into Resonant GeoData, our existing collection of open source web applications used to catalog and search geospatial data.

The STAC Specification

Some geospatial data describes information that varies in space and time. For example, the Landsat program captures images of the earth via a satellite multiple times a year and has been doing so for decades. Spatial data has a widely adopted open standard for vector data, GeoJSON, but GeoJSON doesn’t specify how to describe spatiotemporal data–that is, spatial data that also changes with time.

Also, GeoJSON doesn’t specify how to capture all the nuanced details of geospatial data. Auxiliary information, such as an image’s wavelength, is required to understand satellite imagery. Many openly available geospatial data formats provide different ways to represent this auxiliary information. This variability is a burden to scientists and software developers since the tools built to analyze one dataset will not be compatible with other datasets.

The STAC and STAC API specifications are two related efforts to provide consistency for representing and transmitting spatiotemporal data on the web. A client that supports STAC formatted data can work with STAC data from any vendor.

A snippet of a STAC Collection, part of the STAC specification

Major agencies around the world, such as NASA and the European Space Agency (ESA), are using STAC to catalog massive datasets. See the STAC Index for links to popular STAC collections and STAC-compatible tools.

Resonant GeoData

Resonant GeoData is a suite of web applications that catalogs geospatial and geo-referenced data. It provides a powerful visual interface to geospatial data and gives you tools–like faceted search and authorization logic–to explore and protect your data.

By adopting STAC, Resonant GeoData can interoperate with the large ecosystem of tools built for STAC. For example, you can use PySTAC Client to explore Resonant GeoData in your terminal. You could even hook up QGIS to work with Resonant GeoData via STAC.

Schema Crosswalk

Resonant GeoData has a carefully modeled computational representation of geospatial data that enables many features of our web application. These include efficient searches, simplified user authorization management, and support for extensible geospatial data types (e.g. raster imagery, full motion video, point clouds). Resonant GeoData can handle a massive quantity of geospatial data thanks to this internal representation.

The abstract relational data modeling used by Resonant GeoData will always fundamentally differ from the shape of data that STAC specifies. The challenge is to clearly and efficiently transform our representation of geospatial data into one that is STAC compliant.

A schema crosswalk tries to map fields from one representation of data to another. It’s a table–perhaps printed in ink on a piece of paper–that describes how one schema can be related to another. Resonant GeoData uses this concept of a schema crosswalk to translate our representation of geospatial data to one that is STAC compliant. But instead of ink and paper, we use three tools to define a precise, digital schema crosswalk:

  1. Bidirectional maps
  2. SQL
  3. Plain old Python

Bidirectional maps

Our tool of choice is the bidirectional map. A regular dictionary maps information from point A to point B, but a bidirectional map can tell you how to get back to point A from B.

Using a bidirectional map to look up states and capitals

This eases supporting both input and output of STAC data. For instance, we use bidirectional maps to clearly represent numerical band ranges as a “common band name” as described in the STAC Electro-Optical Extension Specification and vice versa.

SQL

Such an elegant way to describe the schema crosswalk is not always available, so we use SQL when things get messier. We can describe the schema crosswalk in a single declarative “sentence” as a clear and expressive source of truth for our schema crosswalk. SQL queries are expressive enough to capture our schema crosswalk and well-engineered implementations of SQL databases, such as PostgreSQL, can formulate highly optimized execution plans for even complicated queries. The result is a performant and scalable implementation of our STAC API.

A sample of our SQL statement expressed via the Django ORM

Plain Old Python

The above two methods take us pretty much to the finish line, but there are some aspects of the schema crosswalk that don’t belong in SQL. Let’s take a look at the STAC Item “assets” dictionary as an example.

Response from our SQL query
The final STAC-compliant asset object

The response to our SQL query provides the assets as a list, each uniquely identified by “id”, but STAC specifies the assets should be a dictionary with this “id” as a key. This kind of transformation is possible in SQL, but it adds significant complexity to the query. The maintainability and readability of the SQL query are paramount, so we prefer to perform this translation in a few lines of Python instead.

Other things are simply not possible in SQL. For example, we don’t keep track of the file type in the database, determining it instead from the file itself. The “file” is an identifier to an object stored in S3 for which we create a presigned URL by communicating with the S3 storage provider. All of these things must be done in Python rather than SQL.

Tomorrow’s STAC and Resonant GeoData

The STAC and STAC API specifications are continually growing. While the core STAC specification is reasonably stable, the STAC API specification is still in active development. At the time of writing, Resonant GeoData follows one of the very latest bleeding edge releases of the STAC API specification, v1.0.0-beta.5. Things are moving fast, and we’re excited to continue following along with the STAC API development to its future 1.0.0 stable release.

And that’s just the beginning. Beyond the core specification are many STAC extensions that are currently being developed. We plan to implement even more of these extensions in Resonant GeoData for our customers.

If you have particularly complex data and are interested in advanced support or customization of Resonant GeoData, Kitware can help. Sometimes an open standard can’t handle edge cases specific to the complex data you may have. We’re excited to learn about and help you solve your problems–perhaps even including new work on more schema crosswalks! Contact us at kitware@kitware.com to speak with someone on our team.


We recently did a talk on making this schema crosswalk during the 2022 Cloud-Native Geospatial Outreach Event. Check out the video here.

Leave a Reply