1 of 48

What is Bioconductor?

Maria Doyle

Bioconductor Community Manager

maria.doyle@ul.ie

Slides: Lori Shepherd, Bioconductor Core Team

2 of 48

Bioconductor is not a package; it’s a package repository

Bioconductor provides tools for the analysis, comprehension, and visualization of genetic and genomic data as an R package repository system. It is open source and open development, relying on community contributed R packages.

3 of 48

Just some of the many Bioconductor available packages!!

4 of 48

Bioconductor is an organization

Advisory Boards Help Shape the direction of the project and ensure technological relevance:

  • Bioconductor Foundation / European Bioconductor Society

501(c)(3) organization , handles money regarding conference organization, donation, and sponsorship

  • Scientific Advisory Board (SAB)

Provides external guidance and oversight of scientific direction of the project. Invitation Only Board.

  • Technical Advisory Board (TAB)

Advises on Project and Package level infrastructure for the project. Open Call to anyone; elections held annually.

  • Community Advisory Board (CAB)

Dedicated to developing, enhancing, and diversifying the Bioconductor community. Open Call to anyone; elections held annually.

  • Core Team

Developers that maintain, enhance, and develop core packages and project level infrastructure

  • Global Bioconductor Community !!

Note:

Bioconductor Foundation is being dissolved as we now have NumFocus as fiscal sponsor.

5 of 48

Technical Advisory Board

https://bioconductor.org/about/technical-advisory-board/

Vince Carey

Charlotte Soneson

Levi Waldron

Sean Davis

Laurent Gatto

Ludwig Geistlinger

Helena Crowell

Kasper Daniel Hansen

Stephanie Hicks

Wolfgang Huber

Rafael Irizarry

Lori (Shepherd) Kern

Michael Love

Davide Risso

6 of 48

https://bioconductor.org/about/community-advisory-board/

Kevin Rue-Albrecht

Johannes Rainer

Hedia Tnani

Mike Smith

Leo Lahti

Luyi Tian

Kozo Nishida

Nicole Ortogero

Daniela Cassol

Aedin Culhane

Maria Doyle

Lori (Shepherd) Kern

Enis Afgan

Estefania Mancini

Umar Ahmad

Xueyi Dong

Stevie Pederson

Mengbo Li

Jiefei Wang

Jordana Muqanguzi

Janani Ravi

7 of 48

Core Team

https://bioconductor.org/about/core-team/

Lori (Shepherd) Kern

Vince Carey

Alexandru Mahmoud

Herve Pages

Marcel Ramos

Robert Shear

Jennifer Wokaty

Kayla Interdonato

Nikhil Mane

8 of 48

Core Team: It’s Not Just the Packages

Just a few of many Bioconductor Core Maintained Packages

9 of 48

Core Team: Infrastructure Development and Maintenance

  • Develop and maintain Bioconductor infrastructure packages
  • Bioconductor.org maintenance
  • Support.bioconductor.org maintenance
  • Answering questions on all outlets
  • Bioconductor Build System (BBS) for daily builds and reporting
  • Docker Image generation
  • Binary Package generation (AnVIL/Docker)
  • New Package Submission (SPB) process and review
  • AWS infrastructure
  • Azure infrastructure
  • OSN infrastructure
  • Jetstream2 processes
  • Maintaining the git ecosystem
  • Package download stats
  • BiocManager maintenance
  • Bioconductor ExperimentHub and AnnotationHub integration
  • Outreachy projects
  • And more …. So if we don’t get to your question right away please be kind!

10 of 48

Bioconductor is a community!

Bioconductor provides resources and infrastructure to connect experienced package maintainers, developers and users with those who are less experienced, fostering a collaborative and welcoming community of R Bioconductor users.

11 of 48

Ask questions about packages or data analysis

Thousands of Bioconductor users and maintainers are members

12 of 48

Community Slack: slack.bioconductor.org

13 of 48

Bioconductor Conferences / Workshops / Events

14 of 48

Brixen, Italy 2019

15 of 48

Developing Packages

While the Core Team maintains many ‘core infrastructure packages’, Bioconductor is largely community contributed.

Anyone can contribute a Bioconductor package.

Bioconductor has certain, specific package requirements that must be met and every package undergoes a review process. Once accepted, a package is added to the daily builder and available through BiocManager.

16 of 48

Just some of the many Bioconductor available packages!!

17 of 48

Working Groups and Committees

https://workinggroups.bioconductor.org/

  • Code of Conduct
  • Conference Planning
  • Cloud Methods
  • Developers Training
  • Education
  • Mass Spectrometry for Proteomics and Metabolomics
  • Multilingual
  • Package Review
  • Industry
  • Website
  • Social Media
  • Package Failure Notifications
  • Your new working group…..?

18 of 48

Other Social Media

Bioconductor has other social media outlets to connect the community!

  • Community Slack: slack.bioconductor.org
  • Mastodon: https://genomic.social/@bioconductor
  • LinkedIn: https://www.linkedin.com/company/bioconductor

19 of 48

Note:

The Bioconductor website got a new look in 2024!

20 of 48

Bioconductor Basics

How do I get started?

21 of 48

How does a user interact with Bioconductor?

R and Rstudio

AnVIL

Docker

22 of 48

Installing Bioconductor Packages: BiocManager

  • Bioconductor provides a function for installing packages from the Bioconductor repository and CRAN repository through the BiocManager package on CRAN
  • To install any Bioconductor package (or CRAN package) in R terminal or Rstudio:

# download and install BiocManager

> install.packages("BiocManager")�

# usage

> library(BiocManager)

> install( <name of package[s]> )

# list all CRAN/Biocondutor packages available or search for name

> BiocManager::available() / BiocManager::available(“Genomic”)

23 of 48

Bioconductor Package: AnVIL

AnVIL (NHGRI Analysis Visualization and Informatics Lab-space)

  • Analyze large, open & controlled-access genomic datasets with familiar tools and reproducible workflows in a secure cloud-based computing environment.

https://anvilproject.org/

AnVIL users can perform data analysis with Bioconductor in Jupyter Notebooks or Rstudio.

  • The Bioconductor AnVIL package provides an AnVIL::install() to download Bioconductor package binaries.

24 of 48

Bioconductor Packages: Docker

https://bioconductor.org/help/docker/

  • Release and devel docker containers available
    • bioconductor/bioconductor_docker:devel
    • bioconductor/bioconductor_docker:RELEASE_X_Y
  • Install docker
  • Pull the desired bioconductor docker version
  • Run
    • Rstudio Server
    • command line (directly into R or as a bash shell)
  • Packages available as binary package install using BiocManager::install
  • Ability to modify base image as needed
  • Available on Singularity
  • Available on Microsoft Container Registry and Azure

25 of 48

Finding Bioconductor Packages

26 of 48

Bioconductor Home Page

27 of 48

28 of 48

Bioconductor Packages (biocViews) Page

29 of 48

Books coming soon!

30 of 48

Search for biocViews Terms

Search for package name

31 of 48

32 of 48

Bioconductor Package

Landing Page

33 of 48

All Bioconductor packages use git for source control

Starting from new package submission and review, a package is moved into the Bioconductor git ecosystem. All changes must be pushed to git.bioconductor.org to propagate to users.

34 of 48

Nightly Builds from git.bioconductor.org

  • Packages contributed to Bioconductor have a repository on git.bioconductor.org that should be updated to propagate changes to Bioconductor end users
  • Actions performs:
    • Git clone
    • R CMD INSTALL
    • R CMD BUILD
    • R CMD CHECK
  • If a package builds and has a valid version bump to indicate a new package version, the package is propagated and available through BiocManager::install()

35 of 48

36 of 48

37 of 48

38 of 48

39 of 48

Timestamp to know what day it was generated

Indicates package version, git commit and commit date that the builders used

Click on any stage for more information

40 of 48

Bioconductor Concepts

  • Bioconductor version is closely associated with an R version
  • Bioconductor has a release twice a year.
  • Bioconductor has a release and devel branch of packages
  • Versions of packages have significance

41 of 48

Why is there a release and a devel branch of Bioconductor?

Apr May June July Aug Sept Oct Nov Dec Jan Feb Mar

Current Stable Release of R (R-patched) on CRAN used for both Bioconductor Release and Bioconductor devel branches

Spring Release

Fall Release

Current Stable Release of R (R-patched) on CRAN used for Bioconductor release branch

R-devel used for Bioconductor devel branches

42 of 48

Why is there a release and a devel branch of Bioconductor?

  • Bioconductor has the concept of a Release and a Devel version of every package.
    • Release is the stable, user centric branch. Changes should only be minimal to fix known issues/bugs
    • Devel is for new features, enhancements, and developments
      • Adapt to changes in base R before an R official release
        • http://contributions.bioconductor.org/troubleshooting-build-report.html
      • Adapt to packages enhancement and changes based on package Interoperability. Bioconductor packages can be closely dependent on each other.

  • Bioconductor has two releases a year (Spring/Fall).
    • Spring - closely tied to the R release (normally scheduled one week after R release)
    • Fall - Bioconductor devel switches to using R-devel in preparation for spring R release

43 of 48

Versions of packages

  • Pre-release < 0.99.0
    • No longer permitted. Only on local, non submitted Bioconductor packages.
  • Initial submission into Bioconductor 0.99.0
    • Few exceptions made for x.99.0 on submission for very specific cases (i.e moving from CRAN to Bioconductor)
  • On first Bioconductor release a package is generally 1.0.0
  • Bioconductor bumps versions at release automatically. Package version x.y.z
    • Y = even = release
    • Y = odd = devel
  • BiocManager can validate your installation and package versions with
    • BiocManager::valid()
    • BiocManager::install()

44 of 48

Other Bioconductor Concepts

interop/endomorphism

  • Interoperability
    • How: By reusing common data structures/data classes and existing functions (especially load/read)
    • Why: Users can make workflows easily without worrying about the format of their data

  • Endomorphism
    • We encourage developers to practice this when implementing functions so users know what to expect as output “you get what you give”
    • Not always appropriate or implemented but encouraged

45 of 48

Common Classes and Methods

https://contributions.bioconductor.org/important-bioconductor-package-development-features.html#reusebioc

46 of 48

Common Bioconductor Classes and Methods: Importing Data

  • GTF, GFF, BED, BigWig, etc., – rtracklayer ::import()

  • VCF – VariantAnnotation ::readVcf()

  • SAM / BAM – Rsamtools ::scanBam(), GenomicAlignments ::readGAlignment*()

  • FASTA – Biostrings ::readDNAStringSet()

  • FASTQ – ShortRead ::readFastq()

  • MS data (XML-based and mgf formats) – Spectra ::Spectra(), Spectra ::Spectra(source = MsBackendMgf::MsBackendMgf())

47 of 48

Common Bioconductor Classes and Methods: Classes

  • Rectangular feature x sample data – SummarizedExperiment ::SummarizedExperiment() (RNAseq count matrix, microarray, …)

  • Genomic coordinates – GenomicRanges ::GRanges() (1-based, closed interval)

  • Genomic coordinates from multiple samples – GenomicRanges ::GRangesList()

  • Ragged genomic coordinates – RaggedExperiment ::RaggedExperiment()

  • DNA / RNA / AA sequences – Biostrings ::*StringSet()

  • Gene sets – BiocSet ::BiocSet(), GSEABase ::GeneSet(), GSEABase ::GeneSetCollection()

  • Multi-omics data – MultiAssayExperiment ::MultiAssayExperiment()

  • Single cell data – SingleCellExperiment ::SingleCellExperiment()

  • Mass spec data – Spectra ::Spectra()

  • File formats – BiocIO ::`BiocFile-class`

48 of 48

Questions and Comments

We welcome any comments or questions anyone had on the presentation.

  • Email
    • maria.doyle@ul.ie (Bioconductor Community Manager)
    • lori.shepherd@roswellpark.org (Bioconductor Project Manager / Core Team)

  • Community-bioc.slack.com
    • Display names: @Maria Doyle, @lshepherd