Scientific American: Secret Source Code is Bad for Science
This article, by
about the changes that are required by the scientific establishment to
restore the proper practice of the scientific method in modern science.
Here are some highlights of the article:
“Modern science relies upon researchers sharing their work so that their peers can check and verify success or failure. But most scientists still don’t share one crucial piece of information — the source codes of the computer programs driving much of today’s scientific progress.”
We found that this is indeed the case in the field of Medical Image Analysis,
where no conferences require the disclosure of source code as part of the
process of submission and publication of articles.
The current common practice of “Challenge” events, in which algorithmic results
are compared in the context of a competition, but where there is no sharing nor
redistribution of source code, is one common example of the bad implementation
of a good idea. In the absence of making the source code publicly-available, these
events are limited to provide a marketing platform for the participant, with little or
no impact in promoting the progress of the field.
“Now, a group of scientists is arguing for new standards that require newly published studies to make their source codes available. Otherwise, they say, the scientific method of peer review and reproducing experiments to verify results is basically broken.”
“Missing source codes mean extra headaches for scientists who want to closely follow up on new studies or check for errors. Such unavailability of source codes can also lead to more bad science slipping through the cracks — unreleased and irreproducible codes played a part in a Duke University case that led to study retractions, scientist resignations and canceled clinical drug trials for lung and breast cancer in 2010.”
This is the case we commented on in the post “Reproducibility: When Reproduction Fails!”
Mr. Hsu highlights the critical point required in order to change
the way the system operates, which is to demand source code in publications:
“But of the 20 most-cited science journals in 2010, only three require computer source codes to be made available upon publication. Morin and six colleagues from universities across the U.S. proposed making such policies universal in a policy forum paper that appears in today’s (April 12) issue of the Journal Science (Science is one of the three top journals that require the availability of source codes).”
by Morin, Urban, Adams, Foster, Sali, Baker, Sliz.
“The publication and open exchange of knowledge and material form the backbone of scientific progress and reproducibility and are obligatory for publicly funded research. Despite increasing reliance on computing in every domain of scientific endeavor, the computer source code critical to understanding and evaluating computer programs is commonly withheld, effectively rendering these programs “black boxes” in the research work flow. “
“Most significant may be the absence of a universal disclosure requirement by the gatekeepers of scientific publishing. Of the 20 most-cited journals in 2010 from all fields of science (15), only three (16–18) (including Science) have editorial policies requiring availability of computer source code upon publication. This stands in stark contrast to near-universal agreement among the 20 on policies regarding availability of data and other enabling materials. “
The obvious topic of Open Source surfaces:
“Beyond allowing others to inspect and understand the inner workings of a computer program, open source software (OSS) licenses encourage the free adoption, reuse, and adaptation of computer source code while also assuring the attribution and citations customary in scientific research. For the scientist-programmer, disseminating software under an OSS license can be a simple method for enabling community participation in development, use, and adoption of a program and can lead to enhanced influence, reputation, and increased rates of citation for the author (19). Numerous types of OSS licenses exist to meet the diverse needs of academic environments, many of which were developed by and for academics working at research institutions [e.g., Berkeley Software Distribution (20), MIT (21), and Educational Community License (22)]. OSS licenses are also fully compatible with commercialization of scientist-created software (23) and Bayh-Dole requirements that allow the patenting of inventions created using public funds (24). “
On our side…, we have had reproducibility requirements for the Insight Journal since it was created in 2005,
which include the requirement of including with an article all the materials needed to replicate the work:
source code, input data, output data, tests and parameters.
Back to the Scientific American Article:
The root of some of the closed behaviours are explored:
“Many scientists have learned to write computer code without formal training, and so they may simply not know of the open-source software culture of sharing such codes, Morin and his colleagues said. Others may simply be embarrassed by the “ugly” code they write for their own research.”
and comments on one of the well-known reasons why Open Source software leads to higher-quality software:
“If I knew there was a publication requirement for my code, I probably would have done things like comment it better, kept better track of it, and generally put a bit more thought and effort into my code — which would have certainly helped me and others later on when I inevitably tried to reuse or share it, even if just with others in my own research group,” Morin said.”
The groups that are honestly interested in the progress of scientific
research are taking the lead on ensuring that results of research
projects are made available to the public for dissemination and
verification. This includes the daily practice of preparation and sharing
Open Access publications, Open Data, and Open Source software,
which are the pillars of the practice of Open Science.