Kitware’s Data and Analytics Team is building an open source software platform called OASIS, Open source AI Software Infrastructure for Science, for researchers in the scientific domain. OASIS provides a data API to ingest and serve scientific data formats and annotations within AI workflows. It delivers a unique integration of features such as coupling of data to AI models, scalable training, cloud deployment into a cohesive web and command-line interface, and state-of-the-art techniques to debug and enhance AI models.
Though AI has been very successful in the consumer market, its growth has lagged in the scientific domain. Clunky infrastructure inhibits fast iteration and current AI tools are hardcoded to industrial data types. This requires a great deal of effort to manage and consume scientific data in AI pipelines. Also, the availability of large scientific datasets and AI’s need for large training datasets require model training to run at scale. Model training and inference using cloud infrastructure require domain experts to interface with a complex software stack that is composed of many computer infrastructure pieces such as containerization, one or more machine learning frameworks, and I/O packages.
OASIS makes it easy to consume 2D to 4D datasets by developing new data interfaces that bridge AI tools to scientific data formats. Out of the box, it provides an integrated web interface to upload raw data and annotations, serialize and upload models, run training and inference at scale, as well as visualization tools to help understand model outcome and performance. Kitware can also tailor the OASIS platform to meet specific needs, including custom user interfaces, workflows, and more. We can also work with different cloud providers, including Amazon AWS. Using a customized version of the platform will allow users to use it more efficiently and effectively.
The OASIS Framework
In Phase I of the Department of Energy SBIR project, Kitware prototyped the OASIS data API and the AI workbench. To support the data API on the backend, we developed the extract-transfer-load (ETL) pipeline shown in the figure below.
Figure 1: The OASIS ETL pipeline for remote sensing and geospatial imagery uses a combination of open source tools and technologies. It uses FUSE for random on-the-fly data access, Celery for distributed execution of ETL tasks, a Postgres database for storing metadata, and AI API-specific python loaders for serving data inside AI models.
We created the AI workbench of OASIS using open source Django and Vue.js framework and Kitware-developed extensions to these frameworks such as Resonant GeoData and Girder. To support data visualization of raw imagery, inputs, and output, we integrated Kitware open source tools VTK.js and GeoJS into the web workbench.
The team has developed OASIS task execution capabilities using the Python Celery framework to manage long-running ETL routines, model training, and inference jobs. OASIS’s design is modular and extensible and can work with other workflow systems like Argo. The figure below shows the Phase I prototype of the AI workbench that successfully ran AI models from our open source VIAME effort.
Figure 2: The OASIS AI workbench enables users to upload model containers or use containers hosted on DockerHub to run against cloud-hosted and Resonant GeoData-managed datasets.
To support model analysis, we are in the process of integrating Kitware’s DARPA effort for Explainable AI (XAI) into the OASIS workbench for generation and visualization of saliency maps (e.g. occlusion maps).
Finally, OASIS uses Kubernetes, an open source container orchestration system for automating software deployment, scaling, and management.
OASIS is released publicly on GitHub, which is continuously updated with improvements. If you have questions about using OASIS or are interested in customizing the platform, request a meeting with Kitware’s data and analytics experts. Our team works to help customers deliver their next data and AI workflows using our expertise in emerging web and software infrastructure technologies.
Kitware’s Data and Analytics Capabilities
Kitware’s Data and Analytics Team works to help customers discover and implement state-of-the-art data and AI workflows. We have expertise in emerging web and software infrastructure technologies and apply them to your project. Powered by Kitware’s open source platforms, our team enables deployed scientific systems with scalable large file uploads and downloads. Our web-based scientific and geospatial visualizations are also scalable. We can perform scientific data ingest, validation, and modeling with flexible metadata queries. Kitware’s DIVE platform provides a machine learning annotations workflow and supports data management, user interfaces, and job executions. This allows us to support asynchronous processing integration for scientific and AI workflows, along with complex annotation workflow interfaces. To learn more about the capabilities Kitware has to offer, visit our website.
This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Acquisition and Assistance, under Award Number DE-SC0021596.
This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.