Tidyomics - a tool for multiple omics analysis?
Min Hyung Ryu
October 18, 2023
CDNM Multiple Omics Meeting
Tidyverse and tidyomics
https://github.com/tidyomics
https://r4ds.hadley.nz/
Slides adapted from below:
Credit: Mike Love, Maria Doyle, Stefano Mangiola
Resources:
https://tidyomics.github.io/tidyomicsWorkshopBioc2023/articles/tidyGenomicsTranscriptomics.html
https://stemangiola.github.io/tidySingleCellExperiment/
https://github.com/tidyomics/tidy-genomics-talk/tree/main
| Treatment A | Treatment B |
Mike | - | 2 |
John | 85 | 45 |
Mary | 10 | 5 |
Consider dataset below:
There are three variables:
An Example: tidy version
Person | Treatment | Test result |
Mike | A | - |
John | A | 85 |
Mary | A | 16 |
Mike | B | 2 |
John | B | 45 |
Mary | B | 5 |
Tibble data structure
Advantages of Tidyverse
1. Consistency:
2. Readability:
3. Piping:
4. Data Manipulation:
Verb-based operations
Summarize after grouping
Summarized output
Advantages of Tidyverse
5. Visualization:
6. Data Import:
7. Data Exploration:
8. Package Ecosystem:
Tibble data structure
Example: genomic and transcriptomic data integration
g <- ensembldb::genes(edb)
Only include genes that were found in the scRNAseq data
Call in all the gene_names in your scRNA-seq data
Clean the data from ChIP-seq
Chromatin accessibility
Let’s see if genes near peaks of active chromatin marks (H3K4me3 measured with ChIP-seq)
Derive the distance from gene (scRNAseq) to H3K4me3 peaks (ChiP-seq)
Make the gene-to-peak distance into a categorical variable
Summarize nested data for each cell type
Application:
You can nest for cell-type, disease, donor, etc.
Visualize the results
Other applications?
Summary