How Do We Gather Scientific Knowledge
The full talk is available at:
Stodden introduces the principles of scientific research
by going back to Roger Bacon in 1267 and his concepts of:
Verification of conclusions by Direct Experiment
Importance of Independent Verification
Recording experiments with enough detail that others could reproduce the work
Continuing with Francis Bacon in 1620:
Introducing the idea of inductive reasoning:
going from experimental observations to generalizations.
and the influence of this philosophy in the Royal Society of London,
around 1660, where the first Scientific Journal: “Philosophical Transactions”
was created in 1665.
Stodden then bring us back to the present to make the points that:
“Scientific Computation is becoming central to the Scientific Method
Changing how research is conducted in many fields
Changing the nature of how we learn about our world”
and share her conjecture that:
“Today’s academic scientist probably has more in common
with a large corporation’s information technology manager
than with Philosophy or English professor at the same
Then pointing out that the pervasive use of computation in
scientific research, unfortunately is not being accompanied
by an equal effort for making available the source code and
materials that were used during the research process.
In particular, there is a lag on making scientific data publicly
available under the terms of Open Data.
She then brings our attention to a significant contemporary problem:
“Relaxed practices regarding the communication of
computational details is creating a credibility crisis
in computational science, not only among scientist
but as a basis for policy decision and in the public mind.”
Questions are also raised about whether modern peer-reviewed
Journals are really providing an effective platform for scientific
discussion or not.
As an example, she presents the case of the cancellation of
Clinical trials at Duke, and how the deficiencies in the
computational practices of the original papers were not
detected during peer-review, due to the superficial way in
which peer-review is currently conducted.
The emergence of Computational Research as a third approach
to the scientific process (besides inductive and deductive reasoning)
is challenged by the lack of open sharing of data and source code.
Therefore most published computational result are
“…simply impossible to replicate…“.
Stodden surveyed the reasons behind the lack of willingness to
share data and code on the part of a community of authors and
found them to include:
Time required to clean and document
Time required to deal with questions from users
Preocupation about not receiving attribution
Possibility of pursuing patents
Legal barriers (e.g. Copyrights)
Potential loss of future publications
Competitors may gain an advantage
Web / Disk space
and…The Pursuit of Tenure…
while the top reasons to share were:
Encourage Scientific Advancement
Encourage sharing with others
Be a good community member
Set a standard for the field
Improve the caliber of research
Get others to work in the problem
Increase in publicity
Opportunity for feedback
She closes with a discussion on:
How do we deal with large bodies of source code ?
How do we deal with massive data ?
When we share software, who will maintain it ?
The need for tools on data provenance.
How to train users on the proper use of shared code ?
The fragility of software
A very interesting talk for anyone involved
in the practice of scientific research: