1 of 36

3-dimensional genome. PartII.�Chromatin features detection.

2 of 36

3D organisation of the chromatin revealed by Hi-C

Genome-wide Hi-C interaction map shows intrachromosomal (same as cis-) and interchromosomal (trans-) interactions.

2

3 of 36

Chromosome territories

  • At the highest-level of spatial organization, trans-interactions are rare.
  • Individual chromosomes occupy distinct territories within the nucleus.

3

Bonev et al. Nature Reviews 2016

4 of 36

How to deal with interchromosomal contacts?

4

Cis-to-trans ratio: m1/m2

mean (m1)

mean (m2)

There is no strong definition of cis-to-trans ratio. Variations of this one can be calculated in different ways: as average across all cis- divided by the average of all trans-contacts; as feature for genomic bin (ICF); as value for each pair of chromosome.

ICFj=mean(cis)/mean(trans) – for binj

binj

5 of 36

Intrachromosomal interactions: TADs

  • TADs – topological associated domains: loci belonging to one TAD interact with each other more often than with loci from neighboring TADs

5

Bonev et al. Nature Reviews 2016

6 of 36

TADs detection

6

Какие ТАДы правильные?

7 of 36

TADs detection

7

8 of 36

8

Categories of TAD hierarchy callers

Xu, J., Xu, X., Huang, D. et al. Nat Commun 15, 4376 (2024). https://doi.org/10.1038/s41467-024-48593-7

9 of 36

  • . Linear score shows perspectives for TAD distribution of different sizes by tuning one single parameter, such as corner score,reciprocal insulation (RI), window sizes, size of sliding diamond window and average contact frequency within it, Armatus and matryoshka by a resolution parameter γ.
    • The size of TADs is almost positively correlated with the value of the single parameter, and small TADs from low value are usually positioned in large TADs from high value.
  • Thinking TADs as a series of contiguous blocks on a chromosome, clustering iteratively merges TADs neighbors based on similarity of interactions between contact domains to a larger TAD until reaching a chromosome-arm size, and regards the layer-by-layer clustering relationship as the TAD hierarchies,
  • Network features assume TAD-like structure as the best structural separation on the chromosome, making one TAD as a node and the relation between TADs as an edge. By calculating the edge weight, nodes in the network are divided into vintage clusters, and each cluster covers a large TAD and the nested subTADs within it.
  • The structural entropy is defined over the coding tree of a graph by fixing and decoding the graph in a way that minimizes the uncertainty occurring in random walks.
    • The essence of structural entropy algorithm is to fix the genomic loci at which the uncertainty of the structure is maximized.
    • The less information there is, the more possibility that contact domains are in the same TAD.
  • statistical model characterizes TAD hierarchy and biological properties by certain statistical distribution, for example: Gaussian mixture distribution (GMAP), general mixed distribution combined generalized likelihood ratio test, and probability distribution model with dynamic programming.

9

10 of 36

10

Armatus allows the wide range of TADs sizes.

Matryoshka is a widely used Armatus-based tool for hierarchical TADs calling

HiCExplorer covers the most part of genome, avoiding strange gaps

11 of 36

Insulation score – a measure of local chromatin density

Convenient implementation: Cooltools

11

12 of 36

Directionality index

12

A – upstream

B - downstream

E=(A+B)/2

+HMM on the top -> TADs borders

13 of 36

Reciprocal insulation

13

14 of 36

TADs detection

The number and size of TADs depend on:

  1. parameters provided by user (gamma in Armatus, window size in IS)
  2. resolution
  3. organism
  4. data preprocessing (linear interpolation, high and zero value restriction)

TADs are hierarchical -> the better the resolution, the smaller TADs we can obtain

14

15 of 36

15

Approach

Caller

Input format

Main language

Parameter

Liner Score

Arrowhead

hic format

Shell, Awk, Java

1

Armatus

dense matrix, sparse matrix, Rao format*

C++, Python

1

CaTCH

catch format**

C, R, Shell

0

HiTAD

cool format***

Python

1

matryoshka

dense matrix, sparse matrix, Rao format

C++, Shell

1

OnTAD

dense matrix, hic format

C++

2

Multi-CD

dense matrix

Matlab

NA

Clustering

IC-Finder

dense matrix, sparse matrix

Matlab

NA

TADpole

dense matrix

R

6

BHi-Cect

Rao format

R

0

SpectralTAD

dense matrix, sparse matrix, hic format, cool format, Rao format

R

3

Network features

HBM

dense matrix

R

5

spectral

mat format****

Matlab

NA

3DNetMod

sparse matrix

Python

18

GRiNCH

sparse matrix

C, Python

3

Structural Entropy

deDoc

sparse matrix

Java

0

SuperTAD

dense matrix, sparse matrix

C++

0

Statistical Model

TADtree

dense matrix

Python

6

GMAP

dense matrix, sparse matrix

R

4

PSYCHIC

dense matrix

Matlab, C

NA

HiCKey

dense matrix, sparse matrix, Rao format

C++

6

16 of 36

16

17 of 36

Intrachromosomal interactions: compartments

17

Bonev et al. Nature Reviews 2016

18 of 36

Compartment detection:

18

19 of 36

19

20 of 36

20

Saddle plot

PCA application to the matrix:

the sign of PC1 defines the compartment membership.

21 of 36

Saddle plot

22 of 36

Compartment strength

22

How to estimate difference in compartments?

More accurate method:

for each bin, the average normalized frequency of interactions with bins belonging to the same compartment was divided by the average normalized contacts  with bins from the other compartment.

��

mean of ”reds” divided by mean of “greens”

23 of 36

Subcompartment calling

  • Classical approach:

23

detection via cis-contacts

(in fact, TADs clusterization)

"resolution enhancement" using neural network approaches, such as autoencoder

requires high resolution

(~ 5 billion contacts for human data,

current usual dataset is nearly 10 time less)

24 of 36

Calder

24

The problem: compartments are not just TADs interactions. Moreover, in regions with high rate of extrusion TADs and compartmental structure oppose each other. Cis-approach can be not applicable in this case.

25 of 36

Compartmental and extrusion TADs

25

compartmental

extrusion

26 of 36

Intrachromosomal interactions: loops

Loop-callers: cooltools (python implementation of HICCUPS), MUSTACHE and others

27 of 36

Loop callers

27

Algorithms usually consider 2 features:

  • whole-genome enrichment: selection of candidate interactions, which is based on the model distribution of contacts (often negative binomial)
  • local enrichment: compare selected contact prominence with neighbouring ones )

Another approaches (Mustache) rely on Gaussian smoothing. The algorithm removes noise and details and keeps only the most bright features like loops

+ Mustache, SIP, etc

Personal preference (probably, not the best): cooltools, chromosight

28 of 36

Significant contacts

Apart from loops, there is one more similar, but another type of contacts – significant interactions on Hi-C map.

These can be promoter-promoter, promoter-enhancer or polycomb interactions.

28

Loop

Significant contact

Tool for significant contacts detection: FitHiC2

https://github.com/ay-lab/fithic

29 of 36

FitHiC application example�after additional filtration on specific histone modification: fithic+H3K27me3

29

Without additional filtration result usually seems to be more random

30 of 36

Resolution

30

human, mouse

fruit fly

TADs

10 kb for good quality,

40 kb for worse quality

4-5 kb for good quality, 10 kb for worse quality

Loops

~10 kb

~5 kb (loops are rare in fruit fly)

Compartments

100-250 kb, depends on quality and aims

10-20 kb

Subcompartments

50 kb

?

Significant contacts: promoter-promoter,

promoter-enhancer interactions

2-5 kb for good quality,

10-20 kb for worse quality

2-5 kb

Significant contacts:

polycomb interactions

100 kb

10 kb

Today’s data resolution is often not enough for feature detection

31 of 36

Fires: frequently interacting regions

31

Scale of interactions: ±200kb for human genome

Firecaller:

32 of 36

How to create beautiful average plots?

  • By hand ☺
  • Coolpuppy (https://github.com/open2c/coolpuppy)

32

33 of 36

Bad and good quality example

33

Same resolution, 40 kb

34 of 36

Rabl: centromer and telomer interactions

34

An example of telomer-centromer interactions

in budding yeast

This is a model conformation for

centromer-centromer interactions

35 of 36

Why subcompartments can be detected from trans-interactions

35

Trans-interactions

reveal compartment

structure

36 of 36

Practise: cooltools

36