(Final draft of PSB 2009 session proposal - submitted ~ 6:00PM PST Feb 8 2008. PDF version )

Open Science: Tools, approaches, and implications

Submitted by:

Shirley Wu

        Program in Biomedical Informatics

        MSOB X-215

        251 Campus Drive

        Stanford, CA 94305

        shwu19@stanford.edu

Cameron Neylon

        ISIS Neutron Facility

        STFC Rutherford Appleton Laboratory

        Harwell Science and Innovation Campus

        Didcot, OX11 0QX, UK

        c.neylon@rl.ac.uk

February 8, 2008

Introduction and focus

        The practice of science undergoes constant evolution. As discoveries are made, technologies developed, and data generated, new approaches for conducting science arise and flourish. We are currently witnessing an unprecedented period of scientific and technological advancement, due mostly to the ubiquity, connectivity, and power of computing at multiple levels. Not only has computing drastically changed our ability to produce and analyze data, it is also changing the ways in which we store knowledge and communicate about science. A common theme emerges from these changes: openness.

        Openness in science manifests itself in many ways. Open source tools and open access publishing are, by now, familiar concepts. Research would, no doubt, stall without the many public scientific databases and repositories available. The proliferation of such databases, in turn, has spurred the development of open standards and terminologies for data and information exchange ranging from experimental data [1,2] and systems biology knowledge [3] to biomedical ontologies for text mining [4, 5]. Perhaps most notably, the last few years have witnessed the adoption of what is being termed Open Notebook Science - the practice of disclosing publicly all or part of one's research or laboratory activities, usually through the use of blogs and wikis [6]. As the trend towards Open Notebook science makes clear, open practices in science depend absolutely on the tools and resources provided through internet. The central issues in enabling open practices revolve around capturing, annotating, presenting and interpreting data, as well as addressing social and cultural barriers that arise.

This session would address the development and practice of Open Science with an emphasis on the following areas:

Justification

Openness is crucial to successful science

        Without openness, science, and especially the biomedical sciences, would suffer. This is most evident with regards to open data. Many fields rely on open data from public databases such as GenBank [7], Swiss-Prot [8], and the Protein Data Bank [9], to name only a few. The availability of scientific literature also influences the rate at which research advances. Open data, open access, and open source have all become indispensable to research in the biomedical sciences, and their success suggests that even greater benefit would result from increased openness. Governments all over the world are throwing their weight behind Open efforts, the most recent example of which is the NIH mandate in the U.S. that all publicly-funded investigators make their publications open access, which was signed into law in early January [10]. Perhaps more importantly, national institutes are recognizing the benefits of open practices and are funding broad initiatives for open research frameworks on some of biomedicine's biggest unsolved problems [11].

Scientific discourse on Open Science is falling behind its practice

        It is evident that there is growing interest in and implementation of open practices in science, and yet rigorous forums for presenting methods and discussing issues have been rare, lost among special interest groups and fringe sessions convened for some related, but other purpose. Examples are the Bioinformatics Open Source Conference, held annually as a Special Interest Group (SIG) at Intelligent Systems for Molecular Biology (ISMB), the BioOntologies and BioLINK SIGs at ISMB the past few years, and a Birds of a Feather session at ISMB 2007. In addition to ISMB, the American Medical Informatics Association(AMIA) held sessions on health data exchange and communication in 2007 and 2008, and PSB itself regularly features sessions on data integration, Semantic Web, ontologies, and BioNLP, all of which are related to Open Science as either applications, beneficiaries, or enabling technologies. None of these previous meetings were expressly focused on Open Science as a general concept, however. The best example of an Open Science-themed meeting may be the 2008 Science Blogging Conference held in mid-January in North Carolina [12], where several of the sessions concentrated on Open Science, public scientific data, and Open Science in the developing world. However, this was not a scientific meeting focused on Open Science.

        Similarly, as Open Science is a relatively novel concept, few scientific publications have addressed it specifically. The few peer-reviewed studies that have been published investigate data sharing and open access literature in the biomedical sciences [13-18]. In contrast, Nature has published many “perspectives” and editorials on data sharing, Open Access, and e-Science [19-21]. The fastest growing body of literature on the subject by far, however, is taking place on the web through non-peer-reviewed channels such as Nature Precedings [22,23], blogs [24-27], and popular media, including recent articles in BusinessWeek [28], the NY Times [29], Wired [30], and Scientific American [31].

Why PSB?

        PSB is a high-quality conference that addresses the intersection between the biomedical sciences and computing through community-proposed sessions on "hot topics". Investigators in burgeoning research areas may meet at PSB to help define their fields, set goals, and discuss issues relevant to the development of new technologies and methodologies. Open Science fits this environment very well, for it potentially involves integration between all aspects of science - scientific discourse, hypothesis generation, experimental design, data generation and analysis, presentation of results, data exchange, and formal publication - with computing. And although concepts such as Open Data, Open Source, and Open Access are established to varying degrees, Open Science as a whole is still relatively new. The Open Science community is growing rapidly, and would benefit greatly from an international forum exploring the different tools and resources, socio-cultural and policy issues, and scientific findings relevant to the development of open practices in science. PSB is a uniquely appropriate venue for a meeting on Open Science due to its tradition of exploring new scientific themes and the fact that its audience is likely to be greatly invested in the benefits and outcomes of Open Science.

        Adoption of open practices, although widespread in public and government institutions, is still rare at the level of the individual researcher due to technological and cultural obstacles. Both types of obstacles can be addressed in an international, scientific forum exploring the tools, resources, and questions facing Open Science. It is time for the biomedical sciences and biocomputing - which arguably have the most to gain - to begin exploring the challenges and potential within Open Science as they would any other new technology or development. By participating in a session on Open Science, the research community convened at PSB will be uniquely prepared to undertake much needed methodology development, scientific inquiry, and discussion necessary for advancing Open Science. In particular, systematic studies of the current scientific climate and challenges of Open Science - behavioral, cultural, technological - are needed. This session on Open Science would highlight research, tools, and issues relevant to Open Science both to those active in the Open Science community and those interested in learning about Open Science.

Community involvement

        The community is primed to contribute additional research on the climate and culture of science, as well as tools and resources designed to facilitate Open Science. Papers can be solicited from a number of angles related to Open Science, such as from the bio-ontologies, BioNLP, or open source tools communities. We will also solicit research and policy papers from those involved in Open Access publishing (PLoS, BioMed Central, Nature) and open data sharing. Importantly, however, we will invite those who are directly involved in the development, research, or practice of Open Science, including, but not limited to: Jean-Claude Bradley (Drexel University), Rosie Redfield (University of British Columbia), Michael Barton (University of Manchester), Peter Suber (Open Access correspondent at the Scholarly Publishing And Resources Coalition), Bill Hooker (Shriners Hospital -- Portland), Heather Piwowar, Justin Lustgarten and Wendy Chapman (University of Pittsburgh), Pedro Beltrao (UCSF), Jeremy Frey (Southampton University), Dave de Roure/Clare Goble (Southampton and Manchester Universities and the MyExperiment project), Peter Murray-Rust (Cambridge), and Gunther Eysenbach (University of Toronto). Several of these researchers have already committed their support should the session be accepted (see Appendix B).

Format of a session on Open Science at PSB

        This is an unconventional area for a PSB session and the proposed format is also unconventional, though not prohibitively so by any means. The topics to be covered and the format of the session have been discussed by the community which has come to a broad, although not complete, consensus on the current proposal. Several potential issues stand out: work in this area covers a wide range of disciplines well beyond computational and biological science and for some of these areas is it not clear that 'research papers' in the conventional sense can be sought; many important workers in the field are based in disciplines where conference proceedings do not contribute to career advancement; this is an extremely rapidly moving field and if all oral papers are required nine months in advance the session may well be overtaken by events; the field is driven in large part by early career researchers (e.g. this proposal was largely drafted by a Stanford graduate student) and we should work to give them the opportunity to present; and the authors will be certain to make retention of copyright a condition of submission.

On the basis of these conclusions we propose the following format. We appreciate that this falls outside the traditional approach at PSB in a number of areas but believe it is a compromise proposal that will enable us to present a timely and exciting PSB session.

  1. Tutorial  (to be determined based on content of other submissions, but may involve a collaborative, interactive activity)
  2. Conference session

    Aside from the Late Breaking Talks, the format is essentially preserved, with changes only to the exact nature of the content. The other modifications are related to the presentation and dissemination of session proceedings. Given the nature of the session all contributors will almost certainly request to retain copyright of any papers, and we would wish to see all published material freely available at a single internet location, something PSB already provides. We may wish to redefine the status of the conference presentations and seek to republish the same or similar papers, including those that are not available for the formal printed proceedings, in the Open Access literature (e.g. in a special issue). This would clearly need to be the subject of a discussion between the session organizers, the conference organizers, and the proceedings publishers.

        Similarly, the session would have a web presence which would include the oral presentations and relevant posters, and ideally would be the clearing house for recorded and streaming video and audio of the presentations. The session organizers would work to determine how best to accomplish this with minimum intrusion into other sessions. Alternatively, we could also work to produce recorded video of the other sessions at PSB, if this is attractive to the conference organizers. For our session, the "virtual" portion could be incorporated with the conference website or hosted separately, and the session organizers and volunteers from the community would develop and carry this out with minimum burden on the conference organizers.

Organizer autobiographical notes

Cameron Neylon (STFC Rutherford Appleton Laboratory) is a biophysicist who has always worked in interdisciplinary areas. After undergraduate studies in metabolic biochemistry (U. West. Aust) he has pursued research in molecular biology, biophysics, and high throughput methods (Aust. Natl Univ. and Univ. Bath). In 2001 he took up the position of Lecturer in Combinatorial Chemistry at the University of Southampton and in 2005 he commenced a joint appointment as Senior Scientist in Biomolecular Sciences at the ISIS Neutron Scattering Facility, STFC Rutherford Appleton Laboratory, UK. Dr Neylon is a key contributor to the Research Councils UK funded 4G Basic Technology Programme (£5.1M). Through this and other projects he has gained extensive experience of the challenges of working within and managing complex multidisciplinary research programmes with recent papers in journals as diverse as Cell, Nature Physics, Complex Systems, and Journal of Combinatorial Chemistry. In 2005, in collaboration with Professor Jeremy Frey (University of Southampton) he obtained UK research council (BBSRC) funding to develop and optimise an electronic notebook system for biochemistry laboratories which has lead to his involvement in the Open Science movement. His group is currently moving to a fully Open Notebook [5,18] approach which is being recorded and analysed in his blog, Science in the Open [21]. In 2007 he gave invited talks on Open Science at Drexel University and as part of the International Genetically Engineered Machines workshop at MIT. He has managed several workshops and conferences, including Neutrons in Biology (July, 2007, Rutherford Appleton Laboratory), a satellite meeting of the European Biophysical Societies Association meeting with attendees from all over the world. He also coordinates the recently STFC-funded Research Network for Biomembrane Structure and Function and leads a recent funding application to the UK Engineering and Physical Sciences Research Council for an E-science Network scheme to support meetings to develop data portability and presentations standards for Open Science practices.

Shirley Wu (Stanford University) has been interested in too many things to count since a very early age, explaining in large part her attraction to the interdisciplinary field of bioinformatics. She obtained her Sc.B. in Computational Biology from Brown University and is currently a Ph.D. student at Stanford University, with research projects investigating protein function annotation and text mining to aid in annotation. Notably, her experiences as a graduate student and the experiences of her peers have led her to explore questions related to the sociology of science and the sharing and dissemination of scientific information. Her conclusion is that barriers to learning and conducting science are widespread but surmountable with the right tools and policies in place. To become involved, she began a blog, called One Big Lab [22] to explore the issues relevant to Open Science, but decided that the best way to familiarize herself with Open Science would be to take an active role in its development.

References

  1. Brazma A et al. (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29, 365–371.
  2. Bioinformatics standards for flow cytometry. (December, 2007) <http://flowcyt.sourceforge.net>
  3. Hucka M et al. (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531.
  4. Bodenreider O. (2004) The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res 32, D267-70.
  5. Muller H et al. (2004) Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature. PLoS Biol 2(11), e309.
  6. Bradley, J. (September 2006) "Open Notebook Science". <http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html>
  7. Benson DA et al. (2007) GenBank. Nucleic Acids Res 35, D21-5.
  8. Boeckmann B et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31, 365-370.
  9. Berman HM et al. (2000) The Protein Data Bank. Nucleic Acids Res 28, 235-242.
  10. Kaiser J. (January 11, 2008) “NIH announces public-access policy.” ScienceNOW Daily News. <http://sciencenow.sciencemag.org/cgi/content/full/2008/111/1>
  11.  caBIG Strategic Planning Workspace. (2007) The Cancer Biomedical Informatics Grid (caBIG): infrastructure and applications for a worldwide research community. Medinfo 12(1), 330-334.
  12.  NC Science Blogging Conference. (January, 2008) <http://wiki.scienceblogging.com/scienceblogging/>
  13.  Campbell EG and Bendavid E. (2003) Data-sharing and data-withholding in genetics and the life sciences: results of a national survey of technology transfer officers. J Health Care Law Policy 6(2), 241-255.
  14.  Campbell EG et al. (2002) Data withholding in academic genetics: evidence from a national survey. J Am Med Assoc 287(4), 473-480.
  15.  Vogeli C et al. (2006) Data withholding and the next generation of scientists: results of a national survey. Acad Med 81(2), 128-136.
  16.  Wren JD. (2005) Open access and openly accessible: a study of scientific publications shared via the internet. BMJ 330(7500), 1128.
  17.  Piwowar HA et al. (2007) Sharing detailed research data is associated with increased citation rate. PLoS ONE 2(3).
  18.  Eysenbach G. (2006) Citation advantage of Open Access articles. PLoS Biology 4(5), e157.
  19.  Murray-Rust P. Chemistry for everyone. (2008) Nature 451, 648-651.
  20.  Foster MW and Sharp RR. (2007) Share and share alike: deciding how to distribute the scientific and social benefits of genomic data. Nat Rev Genetics 8, 633-639.
  21.  Butler D. (2007) Data sharing: the next generation. Nature 446, 10-11.
  22.  Murray-Rust P. (2008) Open Data in Science. Available from Nature Precedings. <http://hdl.handle.net/10101/npre.2008.1526.1>
  23.  Bradley J. (2007) Open Notebook Science Using Blogs and Wikis. Available from Nature Precedings. <http://dx.doi.org/10.1038/npre.2007.39.1>
  24.  Bradley J. (2008) Useful Chemistry. <http://usefulchem.blogspot.com>
  25.  Singh D. (2008) Open Science: business|bytes|genes|molecules. <http://mndoci.com/blog/category/science/open-science/>
  26.  Neylon C. (2008) Science in the Open. <http://blog.openwetware.org/scienceintheopen>
  27.  Wu S. (2008) One Big Lab. <http://onebiglab.blogspot.com>
  28.  Tapscott D & Williams AD. (March 2, 2007) “The new science of sharing.” BusinessWeek. <http://www.businessweek.com/print/innovate/content/mar2007/id20070302_219704.htm>
  29.  Vickers A. (January 22, 2008) “Cancer data? Sorry, can't have it.” The New York Times. <http://www.nytimes.com/2008/01/22/health/views/22essa.html?ref=views>
  30.  Goetz T. (September, 2007) “Freeing the dark data of failed scientific experiments.” Wired.
  31.  Waldrop MM. (January 9, 2008) “Science 2.0: Great new tool, or great risk?” Scientific American. <http://www.sciam.com/article.cfm?id=science-2-point-0-great-new-tool-or-great-risk>

Appendix A: Statements of endorsement from organizers' institutions

To whom it may concern:

I can confirm that, having discussed the proposal with my management at STFC Rutherford Appleton Laboratory that my involvement in the proposed session is supported. I commit to attending the meeting if the session is accepted and to providing the time required to organise and coordinate the session.

Yours sincerely,

Cameron Neylon

Senior Scientist, Biomolecular Sciences

ISIS Facility

STFC Rutherford Appleton Laboratory

Harwell Science and Innovation Campus

Didcot, OX11 0QX, United Kingdom

From:   Russ Altman

Subject: PSB Endorsement

Date: January 30, 2008 8:40:36 PM PST

To:   Shirley Wu

Shirley,

    I am writing this letter as your advisor and as Director of your BMI PhD program.   I support your efforts to propose and then (if successful) help run a session at PSB on open science.  You and I have discussed the time commitment and I believe that this is a reasonable use of your time, as you begin to think about activities you may want to pursue after your graduation.  I therefore am supportive.  Good luck in the proposal.

Thanks,

Russ

Appendix B: Letters of support

From:           Heather Piwowar

Subject:         Letter of support for PSB proposal on Open Science

Date:         February 8, 2008 4:46:15 AM PST

To:           Shirley Wu

Dear PSB organizers,

I fully support the proposal for a session on Open Science at PSB 2009, and commit to submitting a research paper on data sharing and reuse.

The specific research topic will be derived from my doctoral dissertation, related to measuring the prevalence, patterns, causes, benefits, and motivations for biomedical data sharing and reuse.  I have a previous publication in this area ("Sharing Detailed Research Data Is Associated with Increased Citation Rate" at PLoS ONE), a few posters (including one at PSB 2008), and several papers in draft.  The paper will be co-authored with Dr Wendy Chapman.

I believe that Open Science definitely constitutes a "hot topic" within biocomputing, and has the potential to fundamentally change the way we think about our work.  The topic is relevant to data producers and data consumers, biologists and computer scientists, all with varied perspectives.

Discussion and measurement of benefits, hurdles, progress, and best practices could (and is) taking place in blogs, the popular press, Birds-of-a-Feather sessions, and scattered research papers.  A session at PSB would be a unique opportunity to give this emerging meta-approach the serious examination it deserves.

Thank you for considering this proposal.

Sincerely,

Heather Piwowar

Doctoral Student

Department of Biomedical Informatics

University of Pittsburgh

From:           Wendy Chapman

Subject:         Open Science Letter of Support

Date:         February 8, 2008 11:29:36 AM PST

To:           Shirley Wu

Shirley,

 

I am writing in support of the idea of a PSB 2009 session on Open Science. I would be happy to contribute to that session by submitting a paper. The topic would be Compiling a Repository of Automatically De-identified Clinical Records Available for NLP Research. I am involved in several efforts to compile such data and to determine the types of annotations that should be performed on the data and should be able to summarize the political and technical issues that are facing us.

 

Best of luck,

 

Wendy Chapman

Assistant Professor

Department of Biomedical Informatics

University of Pittsburgh

 

From:           Jonathan Lustgarten

Subject:         Support for Open Science

Date:         February 8, 2008 10:59:46 AM PST

To:           Shirley Wu

Hello!

 

When speaking with Heather Piwowar, she mentioned your proposal and I wanted to voice my support for your endeavor.  I am planning on submitting (and hopefully attending) a topic in this area specifically the sharing of biomarker and m/z values.  Possible paper titles include:

 

The search for more biomarkers:  Difficulty in reporting, collating and organizing current literature

 

Or

 

A study in building an ontology to share published biomarkers

 

Thanks for organizing this!

 

Sincerely,

Jonathan Lustgarten

 

Jonathan L. Lustgarten, M.S.

Bioinformatics Fellow

Department of Biomedical Informatics

University of Pittsburgh School of Medicine

From:           Pedro Beltrao

Subject:         letter of support for PSB Session on Open Science

Date:         February 8, 2008 3:49:05 PM PST

To:           Shirley Wu

Dear Shirley Wu,

I am writing in support of an PSB session on Open Science. I have been following with great interest the developments in the area and agree that the potential of web tools in science is tremendous and generally overlooked. If such a session is accepted I would attend and would commit myself to submit a presentation. This presentation could be either a general presentation about web tools for scientists (1) or a presentation about a open science project that was recently started by me using a code repository as project management tool (2).

1) Web tools for scientist - A presentation detailing the importance in the democratization of distribution channels and the impact this could have in science. This would cover the origins and impact of so-called web2.0 tools for the masses and the slow uptake of similar concepts for science applications. It would end with a discussion of trends related to how the online world is changing, or could change, how we evaluate scientific findings and the accessibility of raw results and collaborations.

2) Determinants of domain family expansion during evolution - A presentation about a project that I have recently started, shared in the Google Code project repository, as an experiment in open science. This would focus mainly on the experience of conducting a science project shared among different people using a code repository to assign tasks and organize results. Depending on the status of the project and the interest of the results at the time the scientific outcome of the project could also be presented.

Pedro Beltrao

Postdoctoral fellow , UCSF

From: Jeremy Frey

Date: Feb 7, 2008 6:26 PM

Subject: PSB 2009

To: Cameron Neylon

Cameron,

    I would like to support the idea of an Open Science Session at PSB 2009.  I would be very happy to contribute to such a session and a possible title could be

"Biological and Chemical Research -  Open or at last Ajar" :  Blogs, Logs and Pods in and for the laboratory

         

    and/or

"Repositories of Knowledge"  How to make the most of your work

         

Jeremy

PS we can optimize how we might present all the Southampton work on ELS, Blogs, Repositories etc between any/all of us who go

Jeremy Frey

School of Chemistry

University of Southampton

Southampton

SO17 1BJ

UK

from        Jason Kelly

to    Cameron Neylon

date        Feb 8, 2008 3:11 PM

subject        Re: PSB session

        

Sorry I didn't realize you needed more info then I sent previously.

I'm on phone sorry this is short. I'd be happy to give a talk and

submit an abstract about the many open science initiatives that are

happening on  Openwetware.  I would need a travel grant however as we

don't have funds to get me out there. Feel free to rewrite that as

needed.

Thanks good luck getting it set up,

Jason

From: Carole Goble

Sent: 05 February 2008 11:34

To: Cameron Neylon

Cc: carole goble; David De Roure

Subject: Re: MyExperiment Contribution to Pacific Symposium on Biocomputing?

Cameron

we certainly will

Carole

> Dear Carole

>

> I seem to be bombarding you with requests this week. A group of us are

> proposing a session on the general theme of 'Open Science' at the

> Pacific Symposium on Biocomputing, to be held in Hawaii in early January

> next year. The session aims to cover tools for, policy, social issues

> and studies of the effectiveness of open practises in science.

> MyExperiment obviously fits as a tool that enables sharing and it would

> be therefore be valuable to have a presentation on its state and

> direction.

>

> If the session is selected we would be seeking conference papers in

> around June associated with oral presentations to be given at the

> meeting. These are peer reviewed and published as a proceedings volume

> but we would also be insisting on authors retaining copyright. We are

> currently seeking indications as to whether people and groups would be

> interested in contributing a paper as this will strengthen the case for

> the session.

>

> The current draft version of the session proposal is available at;

>

> http://docs.google.com/Doc?id=dv4t5rx_33fpxx9pw5

>

> Thanks

>

> Cameron

From:           Peter Murray-Rust

Subject:         open science support

Date:         February 8, 2008 12:50:08 AM PST

To:           Shirley Wu

This is short as I'm on the road and a bad connection.

I strongly support the idea of Open Science and leant my weight to Cameron Neylon's recent application. Open Science requires that data are made available, not common in some of the subjects which bioscience interacts with, such as chemistry and instrumentation. Moreover much data is lost during the process of conductiong and publishing data in the classical manner wheareas true Open Science captures the data at source and preservers it for re-use.

Peter