1 of 22

Cell type annotation

2 of 22

Previous lectures…

3 of 22

Gene Counts

16

http://data-science-sequencing.github.io/Win2018/lectures/lecture16/ https://bioinformatics-core-shared-training.github.io/cruk-summer-school-2018

/SingleCell/slides/2018-07-25_CRUK_CI_summer_school-scRNAseq.pdf

cDNA alignment to Genome and group Results by cell

  • In each gene, within each cell, the total number of unique UMI is counted and reported as the number of transcripts of that gene for a given cell.

Hundreds of millions of reads Thousands of cells

Count unique UMIs for each gene in each cell

Create digital expression matrix

Gene counts matrix

4 of 22

Cell1

Cell2

Cell N

Gene 1

5

6

5

Gene 2

13

0

13

Gene 3

14

13

14

Gene 4

18

19

18

Gene M

10

10

10

Interpretation of zeroes

5 of 22

Ткани и органы - гетерогенны

Прочтение транскриптома группы гетерогенных клеток (bulk seq) может замаскировать существенные отличия в уровне транскрипции отдельного типа клеток.

На изображении справа - препарат тонкой кишки, содержащий десятки типов клеток.

6 of 22

What is a “cell type”?

  • Originally defined in terms of function, location tissue type, cell morphology
  • Later extended to
    • presence/absence of cell surface markers
    • gene expression (molecular profile)
  • Currently very much less fixed
    • cell cycle phase
    • migration state
    • differentiation: cell state

7 of 22

Why should we identify cell types?

  • Samples are heterogeneous (in general)
  • Tumor sample: how much do they differ from normal cell types?
  • Find new cell types which have been missed by using

“standard” surface markers

  • Follow cell fate and determine cell differentiation mechanisms
  • To determine which cell types might communicate with each other
  • To compare the abundance of cell types in different conditions*

8 of 22

Change in cell type abundances: what are the new cells?

Mouse mesenteric lymphatic endothelial cells. From González-Loyola A et al 2021, DOI: 10.1126/sciadv.abf4335

Cluster ID

Wild type cells

TF-knockout cells

9 of 22

Cell surface markers

  • Often considered the gold standard esp. in immunology
  • mRNA of cell surface markers sometimes lowly expressed or absent
  • Use a combination of such marker genes, and also other genes like marker genes among clusters (eg secreted proteins or transcription factors)

By Lokal_Profil, CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=8797123

10 of 22

Markers of murine B cells and plasma cells. (Tellier J, et al., 2017)

11 of 22

Annotation methods

12 of 22

Manual vs automatic cell type annotation

  • Manual: using marker genes
    • What most people do…
    • Time consuming
    • Requires expert knowledge
    • Sometimes subjective and inaccurate

  • Automatic: requires a reference
    • Use complete cell type-specific mRNA expression profiles based on bulk RNAseq from FACS-sorted ’pure’ populations
    • OR: Use “a reference” of manually curated cells picked from scRNA-seq data sets
    • Can miss cell types if they are not included in the reference
  • Methods:
    • Assign a cell type per individual cell or per cluster of cells (better per cell)
    • Assignment of cell type via correlation of each cell/cluster to the

“reference”

13 of 22

14 of 22

Manual annotation using known marker genes

Microglia

Astrocytes

Human glioblastoma multiforme cells, 10x Genomics data (source of data to play with)

https://support.10xgenomics.com/single-cell-gene-expression/datasets/4.0.0/Parent_SC3v3_Human_Glioblastoma

15 of 22

Databases with cell type marker genes

  • PanglaoDB https://panglaodb.se/ (mouse and human)

Check out https://cran.r-project.org/web/packages/rPanglaoDB/index.html

(Regev et al) single cell RNA seq atlas, also some mouse data

16 of 22

Databases with cell type marker genes

17 of 22

SingleR

SingleR

Easy access to rich reference data:

  • HPCA: manually-annotated Human Primary Cell Atlas 37 main types, 157 subtypes, 713 samples
  • BluePrint +ENCODE

24 main types, 43 subtypes, 259 bulk RNAseq samples

- Mouse: ImmGen and ‘mouse.rnaseq’ (brain-specific)

Classifies cells to both main types and subtypes, performs both single cell-wise and cluster-wise annotation

18 of 22

Several methods are available

19 of 22

Check benchmarks!!

20 of 22

LLMs

21 of 22

Combine published scRNAseq datasets to create an atlas that can be used as a reference

ProjecTILs, an algorithm for reference atlas projection

Andreatta et al 2021 Nat. Comm. https://www.nature.com/articles/s41467-021-23324-4

https://github.com/carmonalab/ProjecTILs

22 of 22

Additional links

Slides adopted from https://sib-swiss.github.io/single-cell-training/day2/day2-4_cell_annotation.html