1 of 76

Markov Model

Assignment Guide

How does predictive text work?

2 of 76

Markov Model

So what is a Markov Model?

3 of 76

Markov Model

So what is a Markov Model?

It takes, as input, a writing sample.

4 of 76

Markov Model

So what is a Markov Model?

It takes, as input, a writing sample.

It generates text in the same "style" as the input.

5 of 76

Markov Model

So what is a Markov Model?

It takes, as input, a writing sample.

It generates text in the same "style" as the input.

There are two phases to this process:

6 of 76

Markov Model

So what is a Markov Model?

It takes, as input, a writing sample.

It generates text in the same "style" as the input.

There are two phases to this process:

  • Use the input to build a model.

7 of 76

Markov Model

So what is a Markov Model?

It takes, as input, a writing sample.

It generates text in the same "style" as the input.

There are two phases to this process:

  • Use the input to build a model.
  • Use the model to generate text.

8 of 76

Markov Model

So what is a Markov Model?

It takes, as input, a writing sample.

It generates text in the same "style" as the input.

There are two phases to this process:

  • Use the input to build a model.
  • Use the model to generate text.

Side note: This is how a lot of artificial intelligence and machine learning works.

9 of 76

Markov Model

So what is a Markov Model?

It takes, as input, a writing sample.

It generates text in the same "style" as the input.

There are two phases to this process:

  • Use the input to build a model.
  • Use the model to generate text.

The biggest difference between this assignment and our previous assignments is the constructor does a lot of work.

10 of 76

Constructor

As mentioned, the constructor for this program is where a lot of the magic happens.

11 of 76

Constructor

As mentioned, the constructor for this program is where a lot of the magic happens.

The constructor uses a data structure you'll learn more about called a symbol table (ST.java).

12 of 76

Constructor

As mentioned, the constructor for this program is where a lot of the magic happens.

The constructor uses a data structure you'll learn more about called a symbol table (ST.java).

A symbol table is like a encyclopedia, you can look something up and find out more information about it.

13 of 76

Constructor

As mentioned, the constructor for this program is where a lot of the magic happens.

The constructor uses a data structure you'll learn more about called a symbol table (ST.java).

A symbol table is like a encyclopedia, you can look something up and find out more information about it.

For this program, we're using our "encyclopedia" to look up how often a specific letter appears after a given word.

14 of 76

K-grams

We have some input text. We divide it into "k-grams".

15 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

16 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

17 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

18 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

19 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

20 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

4?

21 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

4?

Nope.

22 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

4?

Nope.

23 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

Sam

24 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

Sam

am_

25 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

Sam

am_

m_s

26 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

Sam

am_

m_s

_sa

27 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

Sam

saw

am_

m_s

_sa

28 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

Sam

saw

am_

m_s

_sa

Skipping ahead...

29 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

Sam

saw

her

pal

am_

aw_

er_

m_s

w_h

r_p

_sa

_he

_pa

13?

30 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

Sam

saw

her

pal

am_

aw_

er_

al.

m_s

w_h

r_p

_sa

_he

_pa

31 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

Sam

saw

her

pal

am_

aw_

er_

al.

m_s

w_h

r_p

_sa

_he

_pa

14?

32 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

Sam

saw

her

pal

am_

aw_

er_

al.

m_s

w_h

r_p

l.S

_sa

_he

_pa

33 of 76

K-grams

We have some input text. We divide it into "k-grams".

A k-gram is a series of k adjacent characters.

How many k-grams are in this text if k is 3?

Sam saw her pal.

Sam

saw

her

pal

am_

aw_

er_

al.

m_s

w_h

r_p

l.S

_sa

_he

_pa

.Sa

34 of 76

Character Frequencies

We read our input text, k characters at a time.

35 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

36 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

37 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

38 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

39 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

0

40 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

0

You need a column for every possible character.

41 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

1

42 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

1

43 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

1

an

0

0

0

44 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

1

an

1

0

0

45 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

1

an

1

0

0

46 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

1

an

1

0

0

na

0

0

0

47 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

1

an

1

0

0

na

0

0

1

48 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

1

an

1

0

0

na

0

0

1

49 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

1

an

2

0

0

na

0

0

1

50 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

1

an

2

0

0

na

0

0

1

51 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

1

an

2

0

0

na

0

1

1

52 of 76

Character Frequencies

We read our input text, k characters at a time.

For each k-gram, we record the character after it.

We're building a table of k-grams and char. frequencies.

Let's do an example. Our input is banana and our k is 2.

a

b

n

ba

0

0

1

an

2

0

0

na

0

1

1

ab

1

0

0

53 of 76

Character Frequencies

Worth practicing. The input is alfalfa and k is 3.

54 of 76

Character Frequencies

a

l

f

alf

2

0

0

lfa

1

1

0

fal

0

0

1

faa

0

1

0

aal

0

0

1

Worth practicing. The input is alfalfa and k is 3.

55 of 76

Testing the Constructor

All of the work we did up until now was in the constructor.

56 of 76

Testing the Constructor

All of the work we did up until now was in the constructor.

Once you have it working, you're ~50% done.

57 of 76

Testing the Constructor

All of the work we did up until now was in the constructor.

Once you have it working, you're ~50% done.

Here are some methods to help you test the constructor:

58 of 76

Testing the Constructor

All of the work we did up until now was in the constructor.

Once you have it working, you're ~50% done.

Here are some methods to help you test the constructor:

int order()

Returns k. Store k as an IV.

59 of 76

Testing the Constructor

All of the work we did up until now was in the constructor.

Once you have it working, you're ~50% done.

Here are some methods to help you test the constructor:

int order()

Returns k. Store k as an IV.

String toString()

"aa: a 1 g 1\nag: a 3 g 2\ncg: ..."

60 of 76

Testing the Constructor

All of the work we did up until now was in the constructor.

Once you have it working, you're ~50% done.

Here are some methods to help you test the constructor:

int order()

Returns k. Store k as an IV.

String toString()

"aa: a 1 g 1\nag: a 3 g 2\ncg: ..."

61 of 76

Testing the Constructor

All of the work we did up until now was in the constructor.

Once you have it working, you're ~50% done.

Here are some methods to help you test the constructor:

int order()

Returns k. Store k as an IV.

String toString()

"aa: a 1 g 1\nag: a 3 g 2\ncg: ..."

int freq(String kgram)

How many times does kgram appear in the text?

62 of 76

Testing the Constructor

All of the work we did up until now was in the constructor.

Once you have it working, you're ~50% done.

Here are some methods to help you test the constructor:

int order()

Returns k. Store k as an IV.

String toString()

"aa: a 1 g 1\nag: a 3 g 2\ncg: ..."

int freq(String kgram)

How many times does kgram appear in the text?

int freq(String kgram,

char c)

How many times does c appear after kgram in the text?

63 of 76

Generating Text

Okay, now that we've built our model, let's use it!

64 of 76

Generating Text

Okay, now that we've built our model, let's use it!

There's just one method left: char random(String kgram).

65 of 76

Generating Text

Okay, now that we've built our model, let's use it!

There's just one method left: char random(String kgram).

int[] a = {0, 0, 1, 1};

66 of 76

Generating Text

Okay, now that we've built our model, let's use it!

There's just one method left: char random(String kgram).

int[] a = {0, 0, 1, 1};

int i = StdRandom.discrete(a);

67 of 76

Generating Text

Okay, now that we've built our model, let's use it!

There's just one method left: char random(String kgram).

int[] a = {0, 0, 1, 1};

int i = StdRandom.discrete(a);

// 50% chance i is 2, 50% chance i is 3

68 of 76

Generating Text

Okay, now that we've built our model, let's use it!

There's just one method left: char random(String kgram).

int[] a = {0, 0, 1, 1};

int i = StdRandom.discrete(a);

// 50% chance i is 2, 50% chance i is 3

int[] b = {1, 2, 3, 4};

int j = StdRandom.discrete(b);

69 of 76

Generating Text

Okay, now that we've built our model, let's use it!

There's just one method left: char random(String kgram).

int[] a = {0, 0, 1, 1};

int i = StdRandom.discrete(a);

// 50% chance i is 2, 50% chance i is 3

int[] b = {1, 2, 3, 4};

int j = StdRandom.discrete(b);

How likely is it for j to be 3?

70 of 76

Generating Text

Okay, now that we've built our model, let's use it!

There's just one method left: char random(String kgram).

int[] a = {0, 0, 1, 1};

int i = StdRandom.discrete(a);

// 50% chance i is 2, 50% chance i is 3

int[] b = {1, 2, 3, 4};

int j = StdRandom.discrete(b);

How likely is it for j to be 3?

4/(1+2+3+4) = 40%.

71 of 76

Lastly, the Client

TextGenerator.java is a short program.

72 of 76

Lastly, the Client

TextGenerator.java is a short program.

It uses StdIn.readAll().

73 of 76

Lastly, the Client

TextGenerator.java is a short program.

It uses StdIn.readAll().

Prints output one character at a time (to avoid having to build the entire output in a String).

74 of 76

Lastly, the Client

TextGenerator.java is a short program.

It uses StdIn.readAll().

Prints output one character at a time (to avoid having to build the entire output in a String).

75 of 76

Lastly, the Client

TextGenerator.java is a short program.

It uses StdIn.readAll().

Prints output one character at a time (to avoid having to build the entire output in a String).

Try generating papers, lyrics, code, music, etc. with your MarkovModel!

76 of 76

End

That's it! Thanks for coming.

Good luck on the assignment.

I'm happy to stay after if you have any questions.