The Spring of Data Sharing
The cold Winter tradition of clinging to data and restricting access to information is now melting down and giving way to a warm Spring of information sharing.
Here are some of the promising events with which we have started the Spring of 2012:
- The Research Works Act has been defeated (the Bad Act)
- The Federal Research Public Access Act is moving forward (the Good Act)
- New Data Sharing sites are germinating (more below)
It has finally clicked that information is not useful when it is accumulated into piles, but rather when it is flowing, much like an electric charge is of no use unless it is flowing and becomes electric current. It follows that any resistance to the flow of information results in heat dissipation, loss of power, and economic waste.
People are also starting to realize that a large portion of research and development in the U.S. is funded by taxpayers. The U.S. invests 2.8% of GDP in R&D, which is about $420 billion a year, out of which $142 billion is provided by the U.S. Federal R&D budget, hence, provided by taxpayers. In this context, there is no economic logic in restricting the flow of the resulting information (technical and scientific papers), just for the convenience of a $2 billion-per-year market of the scientific publishers.
The Spring of Data Sharing
Here are some of the flourishing sites and organizations that will replace the now obsolete practices of the closed access publishers.
Digital Science provides software and information to support researchers and research administrators in their everyday work, with the ultimate aim of making science more productive through the use of technology. As well as developing its own solutions, Digital Science also invests in promising start-ups and other partners, working closely with them to help them realize their full potential. Its platforms include FigShare and 1DegreeBio.
BioSharing works at the global level to build stable linkages in particular between journals and funders, who are working to implement data sharing policies and well-constituted standardization efforts in the biosciences domain. The goal of this work to expedite the communication and production of an integrated standards-based framework for the capture and sharing of high-throughput genomics and functional genomic bioscience data, in particular. This objective is achieved via the creation of web-based catalogs and a communication forum.
BuzzData gives your data a permanent home online so that it can evolve into actionable insights. BuzzData gives you excellent version control so you can see how your data evolves over time. Revert to old versions effortlessly, whenever you need them. Tag your datasets and build your own multi-dimensional data universe. Follow people, datasets and tags so you get the information you need faster. Add context to your data with attachments, articles and visualizations, and engage in discussion and direct activity with Tasks.
Amazon S3 Public Datasets
Public Data Sets on AWS provide a centralized repository of public datasets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications. Learn more about Public Data Sets on AWS and visit the Public Data Sets forum.
Patients Like Me
PatientsLikeMe is committed to putting patients first. They do this by providing a better, more effective way for you to share your real-world health experiences in order to help yourself, other patients like you and organizations that focus on your conditions.
The Midas Platform is an open-source toolkit that enables the rapid creation of tailored, web-enabled data storage. Designed to meet the needs of advanced data-centric computing, Midas Platform addresses the growing challenge of large data by providing a flexible, intelligent data storage system. The system integrates multimedia server technology with other open-source data analysis and visualization tools to enable data-intensive applications that easily interface with existing workflows.
Midas Platform provides a variety of data access methods, including web, file system and DICOM server interfaces, and facilitates extending the methods in which data is stored to other relational and non-relational databases. Optimized for efficiently centralizing, indexing, and storing massive collections of data, Midas Platform provides the foundation for computational scientific research. Example of instances at: http://insight-journal.org/midas/ and http://midas.kitware.com/.
There are also Journals that strive for Reproducibility and Open Access:
BioMedCentral Open Research Computation
Open Research Computation publishes peer reviewed articles that describe the development, capacities, and uses of software designed for use by researchers. Submissions relating to software for use in any area of research are welcome as are articles dealing with algorithms, useful code snippets, and large applications, web services, and libraries. Open Research Computation differs from other journals with a software focus in its requirement for the software source code to be made available under an Open Source Initiative compliant license, and in its assessment of the quality of documentation and testing of the software. In addition to articles describing software, Open Research Computation also welcomes submissions that review or describe developments relating to software based tools for research. These include, but are not limited to, reviews or proposals for standards, discussion of best practice in research software development, educational and support resources and tools for researchers that develop or use software based tools.
BioMedCentral Research Notes
BMC Research Notes is an open access journal publishing scientifically sound research across all fields of biology and medicine. The journal provides a home for short publications, case series, and incremental updates to previous work with the intention of reducing the loss suffered by the research community when such results remain unpublished.
BMC Research Notes also encourages the publication of software tools, databases and datasets and a key objective of the journal is to ensure that associated data files will, wherever possible, be published in standard, reusable formats. BMC Research Notes is currently working with researchers across the full spectrum of biomedical research to define appropriate recommendations for domain-specific data file standards.
The Insight Journal is an open access online publication covering the domain of medical image processing and visualization. The unique characteristics of the Insight Journal include: open access to articles, data, code, and reviews; open peer-review that invites discussion between reviewers and authors; emphasis on reproducible science via automated code compilation and testing; and support for continuous revision of articles, code, and reviews.
PLoS ONE supports the development of open-source software and believes that for submissions in which software is the central part of the paper, then adherence to appropriate open source standards will ensure that the submission conforms to their requirements. These requirements state that “methods must be described in sufficient detail so that another researcher is able to reproduce the experiments described” as well as their “aim to promote openness in research and intention that all work published in PLoS ONE can be built on by future researchers.” Therefore, if new software or a new algorithm is central to a paper, authors must confirm that the software conforms to the definition of open source as defined by the ten rules of the Open Source Initiative. A condition of acceptance is that the software can be run by reviewers accessing the public software and that results presented in the paper are reproducible. The software need only run on one hardware/software platform in common use by the readership (including Matlab), although it must run without dependencies on proprietary or otherwise unobtainable ancillary software. Articles describing software that requires access to databases and other resources whose persistence is not guaranteed (e.g., individual laboratory databases without funding support) will not be considered.
F1000 Research is a new, fully open access publishing program across biology and medicine that will start publishing later this year. It is intended to address the major issues afflicting scientific publishing today: timely dissemination of research, peer review, and sharing of data. Diverging from traditional journal publishing, F1000 Research will offer immediate publication; open, post-publication peer review; open revisioning of work including ongoing updates; and encourage raw data deposition and publication. In addition, F1000 Research will accept a broad range of article formats and content types.