1 of 25

2 of 25

3 of 25

ATG

Coding DNA

Non-coding DNA

4 of 25

cis

trans

5 of 25

6 of 25

What is a transcription factor?

A transcription factor is a protein that regulates transcription

- after nuclear translocation

- by specific interaction with DNA

- or by stoichiometric interaction with a protein that can be assembled into a sequence-specific DNA-protein complex.

7 of 25

Transcription factors

Sequence-specific DNA binding

Non-DNA binding

TF1

TF2

TF3

TF4

adapter

Co-activator

HAT

DNA

Layer I

Layer III

Layer II

8 of 25

DNA binding domain

Activation domain

oligomerization domain

Ligand- binding domain

Protein-protein interaction domain

Structure of transcription factors

9 of 25

10 of 25

AP-1 (human)

11 of 25

Search for new TF binding sites with PWMs

AP-1

  • HOCOMOCO,
  • JASPAR,
  • FactorBook,
  • UniPROBE,
  • TRANSFAC (10290 PWMs)
  • MATCH™
  • HOMER

Motif databases:

Motif anaslysis tools:

12 of 25

Alignment of binding site sequences with conserved, ungapped region

Making a TRANSFAC® matrix from a TFBS alignment

TFBS matrices

13 of 25

Alignment of binding site sequences with conserved, ungapped region

0

0

0

8

A

C

G

T

T

8

0

0

0

A

0

0

0

8

T

8

0

0

0

A

8

0

0

0

A

6

0

1

1

A

3

1

2

2

A

2

2

3

1

G

Position-specific count matrix

Making a TRANSFAC® matrix from a TFBS alignment

0/8

0/8

0/8

8/8

A

C

G

T

8/8

0/8

0/8

0/8

3/8

1/8

2/8

2/8

2/8

2/8

3/8

1/8

Position-specific frequency matrix

TFBS matrices

14 of 25

TFBS prediction with a PWM

(using residue counts to score potential binding sites)

0

0

0

8

A

C

G

T

8

0

0

0

0

0

0

8

8

0

0

0

8

0

0

0

6

0

1

1

3

1

2

2

2

2

3

1

A

C

T

T

G

G

T

A

C

G

T

A

Score: 8 + 8 + 8 + 0 + 8 + 1 + 1 + 3

TFBS matrices

15 of 25

Describing DNA-sequence patterns using PWM

(or more precisly -Position-specific frequency matrix (PFM) )

Position 1234567

Seq1 TGACTGA

Seq2 TGAATCA

Seq3 TGAAACA

Seq4 TTACTCA

Seq5 TGCCTCA

1 2 3 4 5 6 7

A 0042105 0.0 0.0 0.8 0.4 0.0 0.0 1.0

C 0013040 0.0 0.0 0.2 0.6 0.2 0.8 0.0

G 0400010 0.0 0.8 0.0 0.0 0.0 0.2 0.0

T 5100400 1.0 0.2 0.0 0.0 0.8 0.0 0.0

5555555

Motif logo of TRANSFAC matrix: V$AP1_02

L = 7

N = 5

nik , k=1..L

i=A,C,G,T

pik , k=1..L

i=A,C,G,T

 

Binding site scores:

16 of 25

HOMER

17 of 25

V$E2F_Q6 PWM cut-offs

%

FP

FN

Calculation of optimized cut-offs for PWMs

minFN

minFP

minSUM

q -score

18 of 25

19 of 25

tgccacacaggtagactcttTTGAAAATAtgTGTAATAtgtaaaa catcgtgaca cccccatatt… …

. . . . . . .

-96

-79

ST

NF-ATp

AP-1

Mouse Interleukin-2

gene promoter

TGAGTCA

AP-1 consensus

Score of the real site can be very low. Binding is compensated by TF interaction.

Score=0.74

20 of 25

21 of 25

21

info@genexplain.com | www.genexplain.com

Motif enrichment analysis

  • Use cases:
    • Find transcription factors for co-regulated gene set
    • Identify motifs “co-enriched” in the mutated regions

Yes sequences

No sequences

  • Which binding sites are significantly enriched in the Yes sequences?
  • Site optimization tool
  • Automatic threshold optimization
  • One-sided binomial test for significant enrichment of sites

22 of 25

23 of 25

24 of 25

Fuzzy puzzle!

DNA

Gene

Silent

Repressor

DNA

Gene

Hyper

active

Repressor

25 of 25

Problem of the choice of background sequences.

..AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA..