An Approach based on the Software Sustainability Matrix
In our previous blog , we introduced the Software Sustainability Matrix (SSM). In this blog, we will expand on the SSM and describe an approach to producing a project’s SSM score. Given that much of the SSM is based on subjective assessments, we are fully aware that this approach cannot assign scores based on rigorous, objective criteria. However, in the spirit of a continuing conversation on this topic, we have found that the methodology described below is a valuable tool to assess and improve sustainability.
As described in the previous blog, the SSM is organized into four value areas: 1) Impact, 2) Risks, 3) Community, and 4) Technology. Each of the four areas is scored using multiple metrics. To produce the final SSM score, we evaluate the metrics to produce a value area score. We then use a weighted combination of the value areas to produce a final score in the range of [0,100], with 100 being the most sustainable.
A detailed description of the four value areas, and associated metrics, is provided below. (Refer to the initial blog for a summary description of each of these areas.)
1. Impact. Impact can be measured in many different ways. Certainly, in today’s world, economic impacts are an obvious choice. However, in many cases, open source software does not directly nor easily translate into economic measures. For example, societal benefit, entertainment, and advertising impacts may not directly produce a revenue stream but instead provide a value stream on which to monetize products and services. Estimating impact requires a certain amount of prognostication: In 1991, the perceived impact of Linux was probably zero, while today, the valuation of the Linux market is on the order of tens of billions of dollars .
The metrics used to assess impact are perceived value, the size of the user base, and the business model. For simplicity, we often estimate perceived value using standard market evaluation approaches. That is, examine market size (or projected market size) and combine it with estimates of market adoption. Measures of the size of the user base, including downloads, size of mailing lists/forums, and search analytics, provide objective measures that quantify the current impact (measures which are probably next to useless for valuing novel software technologies and niche applications). Finally, we typically consider the “business model” used to sustain software in the long term. There is no long-term business model supporting many software systems, except to tap the goodwill of the community – which for larger communities may be perfectly adequate. The long-term viability of research software is a concern for many since funding agencies typically support software systems for relatively short periods of time (typically for a decade or less). So for software systems that depend on just a few developers or the backing of a funding agency, the lack of any business model to carry the software may pose a significant risk to the long-term viability of the system.
2. Risks. Software systems face many risks throughout their lifecycle. Many software systems are initiated by a few key developers without any community support. The “bus factor”  is a measure of how many developers would have to disappear (i.e. get hit by a bus) before a software system stalls and likely fails. IP issues must be taken seriously: the choice of a license can greatly impact the overall success of the project. As we argued in the first blog , an open source license is a fundamental requirement to long-term sustainability as even the biggest corporations abandon software, go out of business, or are acquired and change terms. We are big fans of permissive licenses, such as BSD, Apache, and MIT variants, as these place few “scary” conditions on the licensee (the measure of scary is the likelihood that corporate lawyers freak out – reciprocal licenses like GPL tend to cause this reaction). IP issues related to patent and copyright infringement typically emerge when systems become larger and more successful, as monetization opportunities emerge and/or become threats to competitive products. It is important that developers manage the contribution process to avoid the introduction of tainted code. Another important risk is competitive software systems that provide significantly overlapping capabilities. For long-term viability, it’s important that software distinguish itself from other systems. Finally, software systems depend on other software, whether to compile and build code, provide basic functionality (e.g., math functions), or even implement major subsystems (e.g., DICOM I/O library). Thus the risk to sustainability can be strongly driven by the sustainability of underlying software components.
3. Community. Community has a significant effect on the long-term sustainability of software: certainly size matters (and to some extent is accounted for in the Impact value area), but also its culture. In particular, welcoming, inclusive, diverse cultures that encourage contributions and support from community members can attract amazing worldwide talent – since many individuals want to contribute in a meaningful way, work with other talented individuals, and have fun while doing it. Typically matters of governance, documentation, and outreach are necessary as systems become larger. Governance is essential as communities grow in size and must make the hard but necessary decisions relative to technical vision, and interpersonal conflict must be managed in a fair and judicious manner. Probably the most important marker of a sustainable community is the sophistication of an associated software process. At a minimum, besides hosting in a public repository such as GitHub, the software should be regularly tested. Ideally, continuous integration must be established to manage the contributions from the community. Without regular testing software is likely to break down over time, or as the saying goes “If it’s not tested, it’s broken.” In fact, a clear marker of the software sustainability of a system is the frequency of building, testing, and releasing software, with continuous approaches indicating the greatest community vitality.
4. Technology. Technology has to do with the implementation of a software system such as architecture, programming language(s) used, and interoperability. While significantly high-valued (i.e., impactful) software may be used despite the use of obsolete technology (business-critical Cobol anyone?), use of outdated methods discourages the adoption by community members (who are often excited by using the latest and greatest tools). It also opens the door to competing systems that use superior implementation approaches to provide the same capabilities with faster, simpler, easier to use, and more maintainable implementations. Interoperability is a key, often overlooked feature. By building systems that play nicely with other software, or provide foundational components, systems may become an integral part of the software ecosystem, effectively increasing their overall impact. Such interoperability may be provided via good APIs, support of a wide range of data formats, and/or architectures that support a modularized organization, enabling easy inclusion of valuable software components into other systems. It is worth mentioning that there is a dark side to excessive use of novel technologies. For example, too early adoption of bleeding edge systems can produce hard-to-use, brittle, under-featured, and poor performing systems – sometimes simpler, easy to use technologies can improve sustainability by providing fully-featured systems which a community can readily contribute to and maintain.
To assign a final sustainability score, we assign a value between [0-100] to the four value areas: Impact (I), Risks (R), Community (C), and Technology(T). We then combine them to produce a weighted score using this formula:
Sustainability Score = FI * I + FR * R + FC * C + FT * T
We typically use FI = FC = 1/3, and FR = FT = 1/6 to produce a maximum score of 100. While there is plenty of disagreement as to the value of these weighting factors, there tends to be general agreement that if a software system is impactful enough, and/or the community is vital enough, the chances that a software system is sustainable is greatly increased (as reflected in the higher factors FI = FC = 1/3). It is easy to underestimate the potential of an impassioned community rallying around an impactful software system: anecdotal evidence suggests that deficiencies in technology, or IP risks can be overcome , and may even take the software to another planet .
The single most important benefit of estimating a SSM score is not in the final score itself. Rather, it is the process of working through the various metrics which make up the SSM. We often find that customers and collaborators can make significant improvements to their sustainability by focusing on just a few areas, such as energizing an existing community, improving software processes, adopting permissive open source licenses, and/or refactoring code to make it easier to use and reuse.
In our experience, open source software with high impact that is associated with large, vital communities, are poised to be long-term sustainable systems. Engaged, enthusiastic communities can typically overcome shortcomings due to technical deficiencies or risks from IP, disappearing developers, or the changing software and hardware terrain. This requires building software cultures that attract and inspire good developers, and ensure stability through sound software processes, especially those that encourage continuous testing. The SSM score is an initial attempt to quantify these characteristics and we’ve found it to be a useful matrix by which to assess and improve the sustainability of software systems.
 – Kitware Blog: How Sustainable is Your Software?
 – Linux operating system market size
 – Wikipedia: Bus factor
 – Wikipedia: SCO-Linux disputes
 – The GitHub Blog: Open source goes to Mars