1 of 24

Tidyomics - a tool for multiple omics analysis?

Min Hyung Ryu

October 18, 2023

CDNM Multiple Omics Meeting

2 of 24

Tidyverse and tidyomics

https://github.com/tidyomics

https://r4ds.hadley.nz/

3 of 24

Slides adapted from below:

Credit: Mike Love, Maria Doyle, Stefano Mangiola

Resources:

https://tidyomics.github.io/tidyomicsWorkshopBioc2023/articles/tidyGenomicsTranscriptomics.html

https://stemangiola.github.io/tidySingleCellExperiment/

https://github.com/tidyomics/tidy-genomics-talk/tree/main

5 of 24

	Treatment A	Treatment B
Mike	-	2
John	85	45
Mary	10	5

Consider dataset below:

There are three variables:

Person
Treatment
Test result

6 of 24

An Example: tidy version

Person	Treatment	Test result
Mike	A	-
John	A	85
Mary	A	16
Mike	B	2
John	B	45
Mary	B	5

7 of 24

Tibble data structure

8 of 24

Advantages of Tidyverse

1. Consistency:

promotes a consistent and coherent approach to data analysis.
follows a unified grammar and design philosophy.

2. Readability:

code is more readable and expressive.
follows conventions resembling natural language.

3. Piping:

uses the `%>%` or ‘|>’ (pipe) operator for chaining data manipulation operations.
enhances code clarity and flow.

4. Data Manipulation:

packages like dplyr and tidyr simplify data wrangling.
includes functions for filtering, grouping, summarizing, and reshaping data.

9 of 24

Verb-based operations

10 of 24

Summarize after grouping

11 of 24

Summarized output

12 of 24

Advantages of Tidyverse

5. Visualization:

ggplot2 creates customizable, publication-quality visualizations
adheres to a "grammar of graphics" for complex plots

6. Data Import:

readr and readxl simplify importing data from various formats.
Supports CSV, Excel, and more.

7. Data Exploration:

dplyr and tidyr facilitate data exploration and pattern identification.

8. Package Ecosystem:

integrates well with other R packages and libraries.
extends the power and capabilities of R.

13 of 24

Tibble data structure

17 of 24

Example: genomic and transcriptomic data integration

g <- ensembldb::genes(edb)

18 of 24

Only include genes that were found in the scRNAseq data

Call in all the gene_names in your scRNA-seq data

19 of 24

Clean the data from ChIP-seq

Chromatin accessibility

Let’s see if genes near peaks of active chromatin marks (H3K4me3 measured with ChIP-seq)

20 of 24

Derive the distance from gene (scRNAseq) to H3K4me3 peaks (ChiP-seq)

21 of 24

Make the gene-to-peak distance into a categorical variable

22 of 24

Summarize nested data for each cell type

Application:

You can nest for cell-type, disease, donor, etc.

23 of 24

Visualize the results

24 of 24

Other applications?

Summary

RNA-seq and methylation integration
CITE-seq (RNA-seq and surface protein markers)
Single-cell multiple omics integration
Metabolomics and proteomics

Tidy data paradigm allows clean and scalable data analysis

Powerful visualization for exploration and publications

Integration of genomics and transcriptomics
Tidyomics is an expanding tool designed to apply “tidy” to omics research