Open Science: Tools, approaches, and implications

Shirley Wu

Cameron Neylon

February 8, 2008

Introduction and focus

        The practice of science undergoes constant evolution. As discoveries are made, technologies developed, and data generated, new approaches for conducting science arise and flourish. We are currently witnessing an unprecedented period of scientific and technological advancement, due mostly to the ubiquity, connectivity, and power of computing at multiple levels. Not only has computing drastically changed our ability to produce and analyze data, it is also changing the ways in which we store knowledge and communicate about science. A common theme emerges from these changes: openness.

        Openness in science manifests itself in many ways. Open source tools and open access publishing are, by now, familiar concepts. Research would, no doubt, stall without the many public scientific databases and repositories available. The proliferation of such databases, in turn, has spurred the development of open standards and terminologies for data and information exchange ranging from experimental data [1,2] and systems biology knowledge [3] to biomedical ontologies for text mining [4, 5]. Perhaps most notably, the last few years have witnessed the adoption of what is being termed Open Notebook Science - the practice of disclosing publicly all or part of one's research or laboratory activities, usually through the use of blogs and wikis [6]. As the trend towards Open Notebook science makes clear, open practices in science depend absolutely on the tools and resources provided through internet. The central issues in enabling open practices revolve around capturing, annotating, presenting and interpreting data, as well as addressing social and cultural barriers that arise.

This session would address the development and practice of Open Science with an emphasis on the following areas:


Openness is crucial to successful science

        Without openness, science, and especially the biomedical sciences, would suffer. This is most evident with regards to open data. Many fields rely on open data from public databases such as GenBank [7], Swiss-Prot [8], and the Protein Data Bank [9], to name only a few. The availability of scientific literature also influences the rate at which research advances. Open data, open access, and open source have all become indispensable to research in the biomedical sciences, and their success suggests that even greater benefit would result from increased openness. Governments all over the world are throwing their weight behind Open efforts, the most recent example of which is the NIH mandate in the U.S. that all publicly-funded investigators make their publications open access, which was signed into law in early January [10]. Perhaps more importantly, national institutes are recognizing the benefits of open practices and are funding broad initiatives for open research frameworks on some of biomedicine's biggest unsolved problems [11].

Scientific discourse on Open Science is falling behind its practice

        It is evident that there is growing interest in and implementation of open practices in science, and yet rigorous forums for presenting methods and discussing issues have been rare, lost among special interest groups and fringe sessions convened for some related, but other purpose. Examples are the Bioinformatics Open Source Conference, held annually as a Special Interest Group (SIG) at Intelligent Systems for Molecular Biology (ISMB), the BioOntologies and BioLINK SIGs at ISMB the past few years, and a Birds of a Feather session at ISMB 2007. In addition to ISMB, the American Medical Informatics Association(AMIA) held sessions on health data exchange and communication in 2007 and 2008, and PSB itself regularly features sessions on data integration, Semantic Web, ontologies, and BioNLP, all of which are related to Open Science as either applications, beneficiaries, or enabling technologies. None of these previous meetings were expressly focused on Open Science as a general concept, however. The best example of an Open Science-themed meeting may be the 2008 Science Blogging Conference held in mid-January in North Carolina [12], where several of the sessions concentrated on Open Science, public scientific data, and Open Science in the developing world. However, this was not a scientific meeting focused on Open Science.

        Similarly, as Open Science is a relatively novel concept, few scientific publications have addressed it specifically. The few peer-reviewed studies that have been published investigate data sharing and open access literature in the biomedical sciences [13-18]. In contrast, Nature has published many “perspectives” and editorials on data sharing, Open Access, and e-Science [19-21]. The fastest growing body of literature on the subject by far, however, is taking place on the web through non-peer-reviewed channels such as Nature Precedings [22,23], blogs [24-27], and popular media, including recent articles in BusinessWeek [28], the NY Times [29], Wired [30], and Scientific American [31].

Why PSB?

        PSB is a high-quality conference that addresses the intersection between the biomedical sciences and computing through community-proposed sessions on "hot topics". Investigators in burgeoning research areas may meet at PSB to help define their fields, set goals, and discuss issues relevant to the development of new technologies and methodologies. Open Science fits this environment very well, for it potentially involves integration between all aspects of science - scientific discourse, hypothesis generation, experimental design, data generation and analysis, presentation of results, data exchange, and formal publication - with computing. And although concepts such as Open Data, Open Source, and Open Access are established to varying degrees, Open Science as a whole is still relatively new. The Open Science community is growing rapidly, and would benefit greatly from an international forum exploring the different tools and resources, socio-cultural and policy issues, and scientific findings relevant to the development of open practices in science. PSB is a uniquely appropriate venue for a meeting on Open Science due to its tradition of exploring new scientific themes and the fact that its audience is likely to be greatly invested in the benefits and outcomes of Open Science.

        Adoption of open practices, although widespread in public and government institutions, is still rare at the level of the individual researcher due to technological and cultural obstacles. Both types of obstacles can be addressed in an international, scientific forum exploring the tools, resources, and questions facing Open Science. It is time for the biomedical sciences and biocomputing - which arguably have the most to gain - to begin exploring the challenges and potential within Open Science as they would any other new technology or development. By participating in a session on Open Science, the research community convened at PSB will be uniquely prepared to undertake much needed methodology development, scientific inquiry, and discussion necessary for advancing Open Science. In particular, systematic studies of the current scientific climate and challenges of Open Science - behavioral, cultural, technological - are needed. This session on Open Science would highlight research, tools, and issues relevant to Open Science both to those active in the Open Science community and those interested in learning about Open Science.

Community involvement

        The community is primed to contribute additional research on the climate and culture of science, as well as tools and resources designed to facilitate Open Science. Papers can be solicited from a number of angles related to Open Science, such as from the bio-ontologies, BioNLP, or open source tools communities. We will also solicit research and policy papers from those involved in Open Access publishing (PLoS, BioMed Central, Nature) and open data sharing. Importantly, however, we will invite those who are directly involved in the development, research, or practice of Open Science, including, but not limited to: Jean-Claude Bradley (Drexel University), Rosie Redfield (University of British Columbia), Michael Barton (University of Manchester), Peter Suber (Open Access correspondent at the Scholarly Publishing And Resources Coalition), Bill Hooker (Shriners Hospital -- Portland), Heather Piwowar, Justin Lustgarten and Wendy Chapman (University of Pittsburgh), Pedro Beltrao (UCSF), Jeremy Frey (Southampton University), Dave de Roure/Clare Goble (Southampton and Manchester Universities and the MyExperiment project), Peter Murray-Rust (Cambridge), and Gunther Eysenbach (University of Toronto). Several of these researchers have already committed their support should the session be accepted (see Appendix B).

Format of a session on Open Science at PSB

        This is an unconventional area for a PSB session and the proposed format is also unconventional, though not prohibitively so by any means. The topics to be covered and the format of the session have been discussed by the community which has come to a broad, although not complete, consensus on the current proposal. Several potential issues stand out: work in this area covers a wide range of disciplines well beyond computational and biological science and for some of these areas is it not clear that 'research papers' in the conventional sense can be sought; many important workers in the field are based in disciplines where conference proceedings do not contribute to career advancement; this is an extremely rapidly moving field and if all oral papers are required nine months in advance the session may well be overtaken by events; the field is driven in large part by early career researchers (e.g. this proposal was largely drafted by a Stanford graduate student) and we should work to give them the opportunity to present; and the authors will be certain to make retention of copyright a condition of submission.

On the basis of these conclusions we propose the following format. We appreciate that this falls outside the traditional approach at PSB in a number of areas but believe it is a compromise proposal that will enable us to present a timely and exciting PSB session.

  1. Tutorial  (to be determined based on content of other submissions, but may involve a collaborative, interactive activity)
  2. Conference session

    Aside from the Late Breaking Talks, the format is essentially preserved, with changes only to the exact nature of the content. The other modifications are related to the presentation and dissemination of session proceedings. Given the nature of the session all contributors will almost certainly request to retain copyright of any papers, and we would wish to see all published material freely available at a single internet location, something PSB already provides. We may wish to redefine the status of the conference presentations and seek to republish the same or similar papers, including those that are not available for the formal printed proceedings, in the Open Access literature (e.g. in a special issue). This would clearly need to be the subject of a discussion between the session organizers, the conference organizers, and the proceedings publishers.

        Similarly, the session would have a web presence which would include the oral presentations and relevant posters, and ideally would be the clearing house for recorded and streaming video and audio of the presentations. The session organizers would work to determine how best to accomplish this with minimum intrusion into other sessions. Alternatively, we could also work to produce recorded video of the other sessions at PSB, if this is attractive to the conference organizers. For our session, the "virtual" portion could be incorporated with the conference website or hosted separately, and the session organizers and volunteers from the community would develop and carry this out with minimum burden on the conference organizers.

Organizer autobiographical notes

Cameron Neylon (STFC Rutherford Appleton Laboratory) is a biophysicist who has always worked in interdisciplinary areas. After undergraduate studies in metabolic biochemistry (U. West. Aust) he has pursued research in molecular biology, biophysics, and high throughput methods (Aust. Natl Univ. and Univ. Bath). In 2001 he took up the position of Lecturer in Combinatorial Chemistry at the University of Southampton and in 2005 he commenced a joint appointment as Senior Scientist in Biomolecular Sciences at the ISIS Neutron Scattering Facility, STFC Rutherford Appleton Laboratory, UK. Dr Neylon is a key contributor to the Research Councils UK funded 4G Basic Technology Programme (£5.1M). Through this and other projects he has gained extensive experience of the challenges of working within and managing complex multidisciplinary research programmes with recent papers in journals as diverse as Cell, Nature Physics, Complex Systems, and Journal of Combinatorial Chemistry. In 2005, in collaboration with Professor Jeremy Frey (University of Southampton) he obtained UK research council (BBSRC) funding to develop and optimise an electronic notebook system for biochemistry laboratories which has lead to his involvement in the Open Science movement. His group is currently moving to a fully Open Notebook [5,18] approach which is being recorded and analysed in his blog, Science in the Open [21]. In 2007 he gave invited talks on Open Science at Drexel University and as part of the International Genetically Engineered Machines workshop at MIT. He has managed several workshops and conferences, including Neutrons in Biology (July, 2007, Rutherford Appleton Laboratory), a satellite meeting of the European Biophysical Societies Association meeting with attendees from all over the world. He also coordinates the recently STFC-funded Research Network for Biomembrane Structure and Function and leads a recent funding application to the UK Engineering and Physical Sciences Research Council for an E-science Network scheme to support meetings to develop data portability and presentations standards for Open Science practices.

Shirley Wu (Stanford University) has been interested in too many things to count since a very early age, explaining in large part her attraction to the interdisciplinary field of bioinformatics. She obtained her Sc.B. in Computational Biology from Brown University and is currently a Ph.D. student at Stanford University, with research projects investigating protein function annotation and text mining to aid in annotation. Notably, her experiences as a graduate student and the experiences of her peers have led her to explore questions related to the sociology of science and the sharing and dissemination of scientific information. Her conclusion is that barriers to learning and conducting science are widespread but surmountable with the right tools and policies in place. To become involved, she began a blog, called One Big Lab [22] to explore the issues relevant to Open Science, but decided that the best way to familiarize herself with Open Science would be to take an active role in its development.


