1 of 22

LINEAR PATTERN SEARCH

Algorithm - 3

By

Dept. of CSE

PVPSIT, Kanuru.

PRASAD V. POTLURI SIDDHARTHA INSTITUTE OF TECHNOLOGY

2 of 22

  • Problem
  • Design and implement a pattern searching algorithm with a performance that is linearly dependent on the length of the string or text being searched. A count should be made of the number of times the search pattern occur in the string.
  • Algorithm development (Inefficient algorithm)
  • To make a start on the linear pattern search algorithm, let us examine the simple pattern matching algorithm:

B. Vinay Kumar

Friday, March 18, 2022

Algorithm 3

LINEAR PATTERN SEARCH

PVPSIT (Autonomous)

Problem Solving Techniques

3 of 22

  • In our example, the match between the pattern and the string proceeds until we get a mismatch with the d and the c in the sixth position.
  • The matching resumes by comparing the second character in the string with the first character in the search pattern.
  • In making this step we need to re-examine characters in the string that we have already looked at before.
  • It is this repeated re-examination is leads to inefficiency of the simple pattern searching algorithm.
  • Our task is to set up a pattern searching algorithm such that it is never necessary to re-examine earlier characters in the string when a mismatch occurs.
  • If we are not to re-examine any earlier characters in the text we need to find a way of using the information gained in making the mismatch.

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

4 of 22

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

5 of 22

  • From the example, for the positions already examined in the partial match at string position one, it can be seen that the second and third positions offer no possibility of a match.
  • The fourth position offers a possibility of a match because the fourth string character matches the first pattern character and so on.
  • Having discovered that the pattern may be able to make a match starting at a position (the fourth) earlier than the mismatch position (i.e. the sixth position) we must try to understand what this means.
  • We see that the reason that a match can start before the mismatch position is because the beginning of the string is repeated before the mismatch.
  • It is this partial match with the repeated part of the pattern that prevents us from resuming the search directly after the mismatch by comparing the first character in the pattern with the seventh character in the string.

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

6 of 22

  • Compare the third character in the pattern with the sixth character in the string as this would introduce no risk of missing a possible match with the pattern positioned starting at the fourth character.

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

7 of 22

  • To discover this let us look more closely at another pattern.

  • If we get a mismatch between the first character in the pattern and the current string character, we can compare the first character in the pattern with the next character in the string.

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

8 of 22

  • If, however, the first pattern character matches with the string and it is followed by a mismatch, we can once again compare the first character in the pattern with the next character in the string.

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

9 of 22

  • As a further example, suppose the first five characters in the pattern are matched at the current location in the string and there is a mismatch at the sixth character.

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

10 of 22

  • In this latest situation, after taking into account the mismatch at the sixth character, the extent of the match may actually be two because of the way in which ab is repeated in the pattern.
  • It follows that we can continue the match by comparing the third character in the pattern with the sixth character in the string.
  • Below table gives the smaller partial matches that will exist when the current mismatch occurs.

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

11 of 22

  • To establish these smaller partial matches needed to recover from a mismatch situation, we need to model the process we have just developed by hand.
  • The amounts to taking one copy of the pattern and considering it as a fixed string.
  • The other copy of the pattern is then located at positions relative to the first copy.
  • Whenever a partial match is established (as for example, starting at the fourth position), the positions that are part of the partial match(i.e. 5, 6, and 7 in our example) are then not considered as positions to look for partial matches.
  • The reason for this is that they could only potentially provide smaller partial matches than the one that has already been established.
  • A little thought reveals that we need to always consider the largest partial matches to avoid missing complete matches when recovering from mismatches.

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

12 of 22

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

13 of 22

  • Having worked out how to derive the partial match information needed for our linear search algorithm, let us try to develop a searching algorithm that used it.
  • At the outset it seems fairly clear that we will need to handle the match and mismatch situations between the pattern and the string differently.
  • Basically, what we must do is consider “placing” the pattern relative to the string as dictated by the partial matches, complete matches, and mismatches that occur.
  • Our basic strategy might be:

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

14 of 22

  • To extend the partial match we will need a mechanism that moves to the next positions in string and pattern.

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

15 of 22

  • The mechanism needed to handle the mismatch condition is therefore:

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

16 of 22

B. Vinay Kumar

Friday, March 18, 2022

PVPSIT (Autonomous)

Problem Solving Techniques

17 of 22

Calculating LPS table

The Knuth Morris Pratt Algorithm = KMP algorithm

LPS - The longest prefix of the substring which is also a suffix.

PVPSIT (Autonomous)

Problem Solving Techniques

18 of 22

1. Pre-processing: The LPS Table

Before searching, we create the LPS (Longest Prefix Suffix) table.

This table tells the algorithm the "recovery position"—how many characters to skip when a mismatch occurs.

Pattern (P): abcabd

  • At Index 3 (a): The prefix a matches the suffix a. Value = 1.
  • At Index 4 (b): The prefix ab matches the suffix ab. Value = 2.
  • At Index 5 (d): No prefix matches the suffix. Value resets to 0.

Index

0

1

2

3

4

5

Char

a

b

c

a

b

d

LPS Value

0

0

0

1

2

0

PVPSIT (Autonomous)

Problem Solving Techniques

19 of 22

Linear Search Pattern Algorithm

PVPSIT (Autonomous)

Problem Solving Techniques

20 of 22

Step

i (String)

j (Pattern)

Comparison

Action

1

0-4

0-4

abcab matches abcab

Matches!

Increment both i and j

2

5 (c)

5 (d)

Mismatch!

Look at LPS[j-1] (LPS of index 4). The value is 2.

So, j=2

3

5 (c)

2 (c)

Match!

Instead of restarting at i=1, we keep i=5 and set j=2. Since S[5] and P[2] match, increment both.

4

6-8

3-5

abd

Continue matching until the end of the pattern.

2. The Search (Dry Run) String (S): abcabcabcabd Pattern (P): abcabd

Index

0

1

2

3

4

5

6

7

8

9

10

11

Character

a

b

c

a

b

c

a

b

c

a

b

d

Index (j)

0

1

2

3

4

5

Pattern

a

b

c

a

b

d

LPS Value

0

0

0

1

2

0

PVPSIT (Autonomous)

Problem Solving Techniques

21 of 22

Why this is "Linear"

In a naive search, if you fail at the 6th character, you might have to go back and start again from the 2nd character.

In Linear Search Pattern, the string pointer (i) never moves backward. It only ever stays still or moves forward, which is why the time complexity is O(n).

PVPSIT (Autonomous)

Problem Solving Techniques

22 of 22

  • Pattern searching in small alphabet systems and multiple keyword searching in text.

B. Vinay Kumar

Friday, March 18, 2022

Applications

PVPSIT (Autonomous)

Problem Solving Techniques