Kitware is excited to announce Resonant, a new open-source entry point into the data and analytics space. In recent years, there have been many advances in cloud computing, large-scale analytics tools, and Web standards. But, we found a lack of fully open, extensible, deployable, and integrative tools for researchers. To address this issue, we decided to build an open platform for reproducible end-to-end data science, called Resonant, which fully embraces the modern Web, Big Data technologies, and our software prowess. This article introduces the software components and use cases in Resonant’s three main areas of focus: data management, visualization, and analysis.
Data Management With Girder
From medical image archives to supercomputing simulation data, we are no stranger to large data sources. What we needed was a coherent, modern framework for managing data of all sorts, along with associated metadata, analyses, and visualizations. We built on the experiences of the Midas data management system to architect a platform focused on supporting Big Data, modern Web technologies, and extensibility. The resulting production-ready data system is called Girder .
Out of the box, Girder supports scalable storage in Hadoop Distributed File System (HDFS), Amazon Simple Storage Service (S3), and MongoDB. Girder also supports arbitrary metadata, as well as server-side and client-side plugins. Girder allows for remote scalable job execution, Ansible provisioning, simple pip installation, and Google account authentication. Girder also enables management of users, sharing, groups, and quotas. Each of Girder’s features is free and open source with a permissive Apache v2 license.
Girder is being put to good use in the COVALIC project . Along with our academic and industry partners, we have combined Girder with decades of high-performance computing and medical imaging experience to produce a platform for hosting data-driven challenges. The entire system is built as a Girder plugin, using Girder’s authentication, storage, and data models to produce a site to design and host challenges and to collect submissions from participants. Girder uses Resonant’s execution engine, Romanesco, to perform custom evaluations of the submissions to produce a ranked leaderboard of challenge phases.
Visualization with GeoJS
Another task for analytics today is hosting visualizations for large and heterogeneous data sources. The Web is a natural place for this data, but we found scalable Web tools lacking, especially in the area of geoinformatics. Building on our experience with the Visualization Toolkit (VTK) OpenGL programming and scalable systems such as ParaView, we created GeoJS, a library for Web visualization of geospatial and other primarily two-dimensional data sources.
For the Defense Advanced Research Projects Agency (DARPA) XDATA large-data visualization program, we built a custom application using GeoJS and Girder, called Minerva Taxi, which is shown below. The application is able to directly render and animate through over one million geolocated elements from open taxi and social media sources. Minerva also supports live streaming of data, such as data from a Twitter firehose, allowing frontend applications to react in real time to new data.
Since the large set of queried data is made available to the Web client, it is possible to interactively filter, brush, animate, and re-bin the data, which would normally require server-side processing. Together, these pieces form applications that enable analyst workflows by (1) easily incorporating the necessary data to solve an analyst’s problem; (2) rapidly returning search results against that data; and (3) displaying results (even if they are large) and derived metrics on a map interactively, including ancillary linked visualizations for non-map data.
Analysis with Romanesco
Romanesco comes paired with Resonant Flow, a Web application for editing and executing analyses and workflows. (See the below figure.) The application uses Girder to store both data and analyses and to manage the remote execution of analyses through Celery, a popular Python framework for distributed task management. Resonant Flow was primarily developed for the National Science Foundation (NSF) Arbor Workflows project , which provides new ways for phylogenetics researchers to share code and data, as well as to educate others on new methods.
Rapid Web Applications with Tangelo
The Future of Resonant
As adherents and advocates of the open-source philosophy, we want to put the future of Resonant in your hands. Please reach out to email@example.com if you have any ideas for possible features or collaborations; we would be happy to discuss them. If you want to get your hands dirty and try out Resonant for yourself, visit the “Getting Started” section of our site at http://resonant.kitware.com.
 Kitware, Inc. “Girder: A Data Management Platform.” Girder. http://girder.readthedocs.org.
 Kitware, Inc. “COVALIC.” COVALIC. https://challenge.kitware.com.
 Kitware, Inc. “Welcome to GeoJS’s Documentation!” GeoJS. http://geojs.readthedocs.org/en/latest/index.html.
 Kitware, Inc. “Romanesco: A simple, flexible execution engine.” Romanesco. http://romanesco.readthedocs.org/en/master.
 Arbor Revolutionary Workflow. “Home.” Arbor: evolutionary workflows for the tree of life. http://www.arborworkflows.com.
 Kitware, Inc. “Welcome to the Tangelo Web Framework!” Tangelo Web Framework. http://tangelo.readthedocs.org.
Jeff Baumes is a Technical Leader and data scientist at Kitware. His primary responsibility is to create tools that effectively visualize large and complex data, spanning relational, geospatial, temporal, bioinformatics, financial, and textual data.
Roni Choudhury is a research and development engineer at Kitware. He has directed the design and development of Tangelo from the ground up to bring advanced and experimental information visualization techniques to the web.
Patrick Reynolds is a research and development engineer at Kitware. He works within Kitware’s Medical Imaging, Computer Vision, and Data and Analytics teams, finding ways for these groups to better enable each other.
Jonathan Beezley is a research and development engineer at Kitware, where he is
one of the principle developers of GeoJS. His research interests include geospatial visualization, Web technologies, and computational statistics.
Aashish Chaudhary is a Technical Leader on the Scientific Computing team at Kitware. Prior to joining Kitware, he developed a graphics engine and open-source tools for information and geospatial visualization. His interests include software engineering, rendering, and visualization.
David Manthey is a research and development engineer at Kitware. He has experience working in the field of computer vision on audio and video distribution, storage, and processing, ranging from direct hardware control to user interfaces.
Zach Mullen is a research and development engineer at Kitware. His areas of interest include Big Data management and analysis, scientific visualization, quality software process, and computer securit