ATG
Coding DNA
Non-coding DNA
…
cis
trans
What is a transcription factor?
A transcription factor is a protein that regulates transcription
- after nuclear translocation
- by specific interaction with DNA
- or by stoichiometric interaction with a protein that can be assembled into a sequence-specific DNA-protein complex.
Transcription factors
Sequence-specific DNA binding
Non-DNA binding
TF1
TF2
TF3
TF4
adapter
Co-activator
HAT
DNA
Layer I
Layer III
Layer II
DNA binding domain
Activation domain
oligomerization domain
Ligand- binding domain
Protein-protein interaction domain
Structure of transcription factors
AP-1 (human)
…
Search for new TF binding sites with PWMs
AP-1
Motif databases:
Motif anaslysis tools:
Alignment of binding site sequences with conserved, ungapped region
Making a TRANSFAC® matrix from a TFBS alignment
TFBS matrices
Alignment of binding site sequences with conserved, ungapped region
0
0
0
8
A
C
G
T
T
8
0
0
0
A
0
0
0
8
T
8
0
0
0
A
8
0
0
0
A
6
0
1
1
A
3
1
2
2
A
2
2
3
1
G
Position-specific count matrix
Making a TRANSFAC® matrix from a TFBS alignment
0/8
0/8
0/8
8/8
A
C
G
T
8/8
0/8
0/8
0/8
3/8
1/8
2/8
2/8
2/8
2/8
3/8
1/8
Position-specific frequency matrix
TFBS matrices
TFBS prediction with a PWM
(using residue counts to score potential binding sites)
0
0
0
8
A
C
G
T
8
0
0
0
0
0
0
8
8
0
0
0
8
0
0
0
6
0
1
1
3
1
2
2
2
2
3
1
A
C
T
T
G
G
T
A
C
G
T
A
Score: 8 + 8 + 8 + 0 + 8 + 1 + 1 + 3
TFBS matrices
Describing DNA-sequence patterns using PWM
(or more precisly -Position-specific frequency matrix (PFM) )
Position 1234567
Seq1 TGACTGA
Seq2 TGAATCA
Seq3 TGAAACA
Seq4 TTACTCA
Seq5 TGCCTCA
1 2 3 4 5 6 7
A 0042105 0.0 0.0 0.8 0.4 0.0 0.0 1.0
C 0013040 0.0 0.0 0.2 0.6 0.2 0.8 0.0
G 0400010 0.0 0.8 0.0 0.0 0.0 0.2 0.0
T 5100400 1.0 0.2 0.0 0.0 0.8 0.0 0.0
5555555
Motif logo of TRANSFAC matrix: V$AP1_02
L = 7
N = 5
nik , k=1..L
i=A,C,G,T
pik , k=1..L
i=A,C,G,T
Binding site scores:
HOMER
V$E2F_Q6 PWM cut-offs
%
FP
FN
Calculation of optimized cut-offs for PWMs
minFN
minFP
minSUM
q -score
tgccacacaggtagactcttTTGAAAATAtgTGTAATAtgtaaaa catcgtgaca cccccatatt… …
. . . . . . .
-96
-79
ST
NF-ATp
AP-1
Mouse Interleukin-2
gene promoter
TGAGTCA
AP-1 consensus
Score of the real site can be very low. Binding is compensated by TF interaction.
Score=0.74
21
info@genexplain.com | www.genexplain.com
Motif enrichment analysis
Yes sequences
No sequences
Fuzzy puzzle!
DNA
Gene
Silent
Repressor
DNA
Gene
Hyper
active
Repressor
Problem of the choice of background sequences.
..AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA..