AAAS: Your Paper MUST include Data, MUST include Code.

March 9, 2011

The Editorial in the February 11^th Issue of Science (www.sciencemag.org), is marking a line on the sand to make clear that the REPRODUCIBILITY requirement of the scientific method IMPLIES that data and source code MUST be made available at the time of publishing scientific results.

Here are some essential quotes from the Editorial “Making Data Maximally Available” by Brooks Hanson, Andrew Sugden and Bruce Alberts:

First, the essence:

“It is obvious that making data widely available is an essential element of scientific research.”

Second, the challenge:

“The scientific community strives to meet its basic responsibilities toward transparency, standardization and data archiving […] yet scientists are struggling with the huge amount, complexity, and variety of the data that are now being produced.”

Third, the Commitment:

“Science policy for some time has been that ALL DATA NECESSARY to understand, assess, and extend the conclusions of the manuscript must be available to ANY reader of Science (see www.sciencemag.org/site/feature/contribinfo ) “

Fourth, the realization that the software used to process the data, MUST be made available as well:

“To address the growing complexity of data and analyses,
Science is extending our data access requirement …
to include COMPUTER CODES involved in the creation or
analysis of data.”

Fifth, the acceptance that Journals can’t trigger these changes by themselves:

“As gatekeepers to publication, journals clearly have an important part to play in making data publicly and permanently available. But the most important steps for improving the way that science is practiced and conveyed must come from the wider scientific community.”

Sixth, YOU HAVE TO DO YOUR PART, Yes YOU !

“Scientists play critical roles in the leadership of journals and societies, as reviewers for papers and grants, and as authors themselves. We must all accept that science is data and that data are science, and thus provide for, and justify the need for the support of, much improved data curation.”

Therefore, if you submit papers, review papers or read papers, it is up to YOU to make sure that the process of scientific publishing truly serves the purpose of advancing the scientific enterprise.

I just did my part.

Yesterday
I declined an invitation to review papers for a conference (that must not be named), due to the fact that they do not have a reproducibility requirement (and as a consequence, papers are not required to include data nor software).

Now it is your turn.

Do your part !

Tags:

Data and Analytics

3 comments to AAAS: Your Paper MUST include Data, MUST include Code.

Gert Wollny says:

March 9, 2011 at 8:09 pm

Only one comment:

Reproducibility is the core of science, therefore, there shouldn’t be any insensitive for the conference organizers to add this requirement. It is, in fact, a sad state of science if it is not assumed implicitly. IMO it is the obligation of the reviewers to reject papers if they are not reproducible. By rejecting to review for this conference, you missed that opportunity.

Fun fact: A paper that got the “best paper award” at a certain conference with a quite high rejection rate was missing all the parameters needed to run the algorithm – in other words, reviewers that put a high weight on reproducibility are needed.

Reply
Luis Ibanez says:

March 9, 2011 at 8:10 pm

Gert,

You are quite right:

“Reproducibility is the core of science”

That is what makes so surprising that almost no Conference or Journal makes it a requirement for publication.

It is not even an entry in the items for the review process.

Most conferences and journals, however, are very explicit about requiring “novelty”. Which indicates two things:

1) That it is indeed possible to give guidelines to reviewers and authors,

and

2) That most conferences and journals confuse their role of scientific communicators with becoming a cheap substitute for the Patent office, or simply a “book” publisher.

See for example

A) IEEE TMI:

http://www.ieee-tmi.org/Reviewer-Info.html

“Is the contribution novel and revolutionary enough to warrant publication in TMI?”

(No mention of REPRODUCIBILITY in the same page).

In my book: This is not a Scientific Journal.

B) Medical Image Analysis (Elsevier)

http://www.elsevier.com/wps/find/reviewershome.reviewers/reviewersguidelines#-%C2%A0%C2%A0%C2%A0%C2%A0Originality

“Is the article sufficiently novel and interesting to warrant publication? ”

No mention of REPRODUCIBILITY in the same page.

They are very concerned though, about “plagiarism”, “fraud”, and “other ethical violations”, among which, apparently, the failure to be able to replicate the work is not an issue.

In my book: NOT a Scientific Journal.

These two are essentially “Marketing” venues used to advertise the work of the authors, without any interest on whether such work is useful to the readers.

They care about whether things are NEW, but they don’t care about whether they really WORK. Quite close to the Patent office indeed…

On the other hand, an example that it is possible to do the right thing and behave as a real scientific society, here it is CVPR:

The ONLY IEEE Conference that cares about REPRODUCIBILITY.

C) CVPR 2010:

http://cvl.umiacs.umd.edu/conferences/cvpr2010/submission/

“Repeatability Criteria: The CVPR 2010 reviewer form will include the following additional criteria, with rating and associated comment field: “Are there sufficient algorithmic and experimental details and available datasets that a graduate student could replicate the experiments in the paper? Alternatively, will a reference implementation be provided?”. During paper registration, authors will be asked to answer the following two checkbox questions: “1. Are the datasets used in this paper already publicly available, or will they be made available for research use at the time of submission of the final camera-ready version of the paper (if accepted)? 2. Will a reference implementation adequate to replicate results in the paper be made publicly available (if accepted)?”

Unfortunately,

I can’t find the same criteria being brought forward in the 2011 edition of the conference…

http://cvpr2011.org/reviewing.html

In summary, most Conferences and Journals turn a blind eye to the reproducibility principles of science, and have a pathological addiction to the concept of “novelty” that has nothing to do with the scientific method.

Why is that ?

Maybe it reflects that the current culture of scientific and technical societies revolves about reputation, fame, glory and credit, and the there is a very limited interest in pushing their respective fields forward. It may also reflect the fact that most of these communities were never educated on the principles of epistemology, and can’t tell the difference between an inventor and a researcher.

This confirms the “Shirky Principle”:

“Institutions will try to preserve the problem to which they are the solution,”

http://en.wikipedia.org/wiki/Clay_Shirky

—

A final comment

I must disagree with you in that I should have just accepted to be a reviewer for the (not to be named) conference, and then apply the criteria or reproducibility despite the fact that it is not a policy of the conference. That would have been a plain injustice to the authors of such papers. I would have rejected their papers systematically, despite that they would have been similar to other papers accepted by fellow reviewers who don’t appreciate the importance of reproducibility.

What I did instead, was to decline; explain why I was declining, and offered to explain further the importance of reproducibility to the conference organizers.

—-

And to anyone who claim that “It is too hard…”,
please allow me to show you are REAL Scientific Journal:

PLoS ONE

http://www.plosone.org/static/reviewerGuidelines.action

Review Guidelines:

“Are the experiments, statistics, and other analyses performed to a high technical standard and are described in sufficient detail?”

“The research must have been performed to a technical standard high enough to allow robust conclusions to be drawn from the data. Methods and reagents must also be described in sufficient detail so that another researcher is able to reproduce the experiments described.”

“Does the article adhere to appropriate reporting guidelines (e.g. CONSORT, MIAME, STROBE, EQUATOR) and community standards for data availability?

“PLoS ONE aims to promote openness in research and intends that all work published in PLoS ONE can be built on by future researchers. We therefore demand conformity to standards for the public deposition of data (for example gene sequences, microarray expression data, and structural studies). Other similar standards that are applicable to specific communities should also be upheld. Failure to comply with community standards is a justifiable reason for rejection.”

This is the kind of community that is really interested in advancing the knowledge of the field.

Not only PLoS ONE is very explicit about the reproducibility criterion:

http://www.plosone.org/static/reviewerGuidelines.action#other

“Does the paper offer enough details of its methodology that its experiments could be reproduced?”

They also understand that part of the role of scientific publishing is to make science accessible to the non-specialist (the tax payers who pay for all our salaries, so that we can spend our time in the interesting work of research).

http://www.plosone.org/static/reviewerGuidelines.action#other

“Is the manuscript written clearly enough that it is understandable to non-specialists? If not, how could it be improved?”

—–

It is time to let go of the old pretentious societies that just do “fake science”, and for us to embrace the new publishing venues who understand the scientific process and are honestly committed to it.

Reply
Gert Wollny says:

March 9, 2011 at 8:11 pm

Actually, Medical Image Analysis has some requirements on reproducibility, they only phrase it “replicate the research”:

Specifically they write in
Conducting the Review|Struture|Methodology:

“Does the author accurately explain how the data was collected? Is the design suitable for answering the question posed? Is there sufficient information present for you to replicate the research? Does the article identify the procedures followed? Are these ordered in a meaningful way? If the methods are new, are they explained in detail? Was the sampling appropriate? Have the equipment and materials been adequately described? Does the article make it clear what type of data was recorded; has the author been precise in describing measurements?”

And in the author guidelines it is written:

“Provide sufficient detail to allow the work to be reproduced. …”

Unfortunately they are not very explicit about what “sufficient” actually means.

Reply

AAAS: Your Paper MUST include Data, MUST include Code.

3 comments to AAAS: Your Paper MUST include Data, MUST include Code.

Leave a Reply to Gert WollnyCancel reply