Markov Model
Assignment Guide
How does predictive text work?
Markov Model
So what is a Markov Model?
Markov Model
So what is a Markov Model?
It takes, as input, a writing sample.
Markov Model
So what is a Markov Model?
It takes, as input, a writing sample.
It generates text in the same "style" as the input.
Markov Model
So what is a Markov Model?
It takes, as input, a writing sample.
It generates text in the same "style" as the input.
There are two phases to this process:
Markov Model
So what is a Markov Model?
It takes, as input, a writing sample.
It generates text in the same "style" as the input.
There are two phases to this process:
Markov Model
So what is a Markov Model?
It takes, as input, a writing sample.
It generates text in the same "style" as the input.
There are two phases to this process:
Markov Model
So what is a Markov Model?
It takes, as input, a writing sample.
It generates text in the same "style" as the input.
There are two phases to this process:
Side note: This is how a lot of artificial intelligence and machine learning works.
Markov Model
So what is a Markov Model?
It takes, as input, a writing sample.
It generates text in the same "style" as the input.
There are two phases to this process:
The biggest difference between this assignment and our previous assignments is the constructor does a lot of work.
Constructor
As mentioned, the constructor for this program is where a lot of the magic happens.
Constructor
As mentioned, the constructor for this program is where a lot of the magic happens.
The constructor uses a data structure you'll learn more about called a symbol table (ST.java).
Constructor
As mentioned, the constructor for this program is where a lot of the magic happens.
The constructor uses a data structure you'll learn more about called a symbol table (ST.java).
A symbol table is like a encyclopedia, you can look something up and find out more information about it.
Constructor
As mentioned, the constructor for this program is where a lot of the magic happens.
The constructor uses a data structure you'll learn more about called a symbol table (ST.java).
A symbol table is like a encyclopedia, you can look something up and find out more information about it.
For this program, we're using our "encyclopedia" to look up how often a specific letter appears after a given word.
K-grams
We have some input text. We divide it into "k-grams".
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
4?
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
4?
Nope.
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
4?
Nope.
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
Sam | | | |
| | | |
| | | |
| | | |
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
Sam | | | |
am_ | | | |
| | | |
| | | |
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
Sam | | | |
am_ | | | |
m_s | | | |
| | | |
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
Sam | | | |
am_ | | | |
m_s | | | |
_sa | | | |
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
Sam | saw | | |
am_ | | | |
m_s | | | |
_sa | | | |
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
Sam | saw | | |
am_ | | | |
m_s | | | |
_sa | | | |
Skipping ahead...
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
Sam | saw | her | pal |
am_ | aw_ | er_ | |
m_s | w_h | r_p | |
_sa | _he | _pa | |
13?
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
Sam | saw | her | pal |
am_ | aw_ | er_ | al. |
m_s | w_h | r_p | |
_sa | _he | _pa | |
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
Sam | saw | her | pal |
am_ | aw_ | er_ | al. |
m_s | w_h | r_p | |
_sa | _he | _pa | |
14?
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
Sam | saw | her | pal |
am_ | aw_ | er_ | al. |
m_s | w_h | r_p | l.S |
_sa | _he | _pa | |
K-grams
We have some input text. We divide it into "k-grams".
A k-gram is a series of k adjacent characters.
How many k-grams are in this text if k is 3?
Sam saw her pal.
Sam | saw | her | pal |
am_ | aw_ | er_ | al. |
m_s | w_h | r_p | l.S |
_sa | _he | _pa | .Sa |
Character Frequencies
We read our input text, k characters at a time.
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 0 |
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 0 |
You need a column for every possible character.
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 1 |
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 1 |
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 1 |
an | 0 | 0 | 0 |
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 1 |
an | 1 | 0 | 0 |
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 1 |
an | 1 | 0 | 0 |
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 1 |
an | 1 | 0 | 0 |
na | 0 | 0 | 0 |
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 1 |
an | 1 | 0 | 0 |
na | 0 | 0 | 1 |
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 1 |
an | 1 | 0 | 0 |
na | 0 | 0 | 1 |
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 1 |
an | 2 | 0 | 0 |
na | 0 | 0 | 1 |
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 1 |
an | 2 | 0 | 0 |
na | 0 | 0 | 1 |
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 1 |
an | 2 | 0 | 0 |
na | 0 | 1 | 1 |
Character Frequencies
We read our input text, k characters at a time.
For each k-gram, we record the character after it.
We're building a table of k-grams and char. frequencies.
Let's do an example. Our input is banana and our k is 2.
| a | b | n |
ba | 0 | 0 | 1 |
an | 2 | 0 | 0 |
na | 0 | 1 | 1 |
ab | 1 | 0 | 0 |
Character Frequencies
Worth practicing. The input is alfalfa and k is 3.
Character Frequencies
| a | l | f |
alf | 2 | 0 | 0 |
lfa | 1 | 1 | 0 |
fal | 0 | 0 | 1 |
faa | 0 | 1 | 0 |
aal | 0 | 0 | 1 |
Worth practicing. The input is alfalfa and k is 3.
Testing the Constructor
All of the work we did up until now was in the constructor.
Testing the Constructor
All of the work we did up until now was in the constructor.
Once you have it working, you're ~50% done.
Testing the Constructor
All of the work we did up until now was in the constructor.
Once you have it working, you're ~50% done.
Here are some methods to help you test the constructor:
Testing the Constructor
All of the work we did up until now was in the constructor.
Once you have it working, you're ~50% done.
Here are some methods to help you test the constructor:
int order() | Returns k. Store k as an IV. |
Testing the Constructor
All of the work we did up until now was in the constructor.
Once you have it working, you're ~50% done.
Here are some methods to help you test the constructor:
int order() | Returns k. Store k as an IV. |
String toString() | "aa: a 1 g 1\nag: a 3 g 2\ncg: ..." |
Testing the Constructor
All of the work we did up until now was in the constructor.
Once you have it working, you're ~50% done.
Here are some methods to help you test the constructor:
int order() | Returns k. Store k as an IV. |
String toString() | "aa: a 1 g 1\nag: a 3 g 2\ncg: ..." |
Testing the Constructor
All of the work we did up until now was in the constructor.
Once you have it working, you're ~50% done.
Here are some methods to help you test the constructor:
int order() | Returns k. Store k as an IV. |
String toString() | "aa: a 1 g 1\nag: a 3 g 2\ncg: ..." |
int freq(String kgram) | How many times does kgram appear in the text? |
Testing the Constructor
All of the work we did up until now was in the constructor.
Once you have it working, you're ~50% done.
Here are some methods to help you test the constructor:
int order() | Returns k. Store k as an IV. |
String toString() | "aa: a 1 g 1\nag: a 3 g 2\ncg: ..." |
int freq(String kgram) | How many times does kgram appear in the text? |
int freq(String kgram, char c) | How many times does c appear after kgram in the text? |
Generating Text
Okay, now that we've built our model, let's use it!
Generating Text
Okay, now that we've built our model, let's use it!
There's just one method left: char random(String kgram).
Generating Text
Okay, now that we've built our model, let's use it!
There's just one method left: char random(String kgram).
int[] a = {0, 0, 1, 1};
Generating Text
Okay, now that we've built our model, let's use it!
There's just one method left: char random(String kgram).
int[] a = {0, 0, 1, 1};
int i = StdRandom.discrete(a);
Generating Text
Okay, now that we've built our model, let's use it!
There's just one method left: char random(String kgram).
int[] a = {0, 0, 1, 1};
int i = StdRandom.discrete(a);
// 50% chance i is 2, 50% chance i is 3
Generating Text
Okay, now that we've built our model, let's use it!
There's just one method left: char random(String kgram).
int[] a = {0, 0, 1, 1};
int i = StdRandom.discrete(a);
// 50% chance i is 2, 50% chance i is 3
int[] b = {1, 2, 3, 4};
int j = StdRandom.discrete(b);
Generating Text
Okay, now that we've built our model, let's use it!
There's just one method left: char random(String kgram).
int[] a = {0, 0, 1, 1};
int i = StdRandom.discrete(a);
// 50% chance i is 2, 50% chance i is 3
int[] b = {1, 2, 3, 4};
int j = StdRandom.discrete(b);
How likely is it for j to be 3?
Generating Text
Okay, now that we've built our model, let's use it!
There's just one method left: char random(String kgram).
int[] a = {0, 0, 1, 1};
int i = StdRandom.discrete(a);
// 50% chance i is 2, 50% chance i is 3
int[] b = {1, 2, 3, 4};
int j = StdRandom.discrete(b);
How likely is it for j to be 3?
4/(1+2+3+4) = 40%.
Lastly, the Client
TextGenerator.java is a short program.
Lastly, the Client
TextGenerator.java is a short program.
It uses StdIn.readAll().
Lastly, the Client
TextGenerator.java is a short program.
It uses StdIn.readAll().
Prints output one character at a time (to avoid having to build the entire output in a String).
Lastly, the Client
TextGenerator.java is a short program.
It uses StdIn.readAll().
Prints output one character at a time (to avoid having to build the entire output in a String).
Lastly, the Client
TextGenerator.java is a short program.
It uses StdIn.readAll().
Prints output one character at a time (to avoid having to build the entire output in a String).
Try generating papers, lyrics, code, music, etc. with your MarkovModel!
End
That's it! Thanks for coming.
Good luck on the assignment.
I'm happy to stay after if you have any questions.