1 of 20

PROBLEM SOLVING TECHNIQUES

By

Dept. of CSE

PVPSIT, Kanuru.

PRASAD V. POTLURI SIDDHARTHA INSTITUTE OF TECHNOLOGY

2 of 20

B. Vinay Kumar

2025-26

TEXT PROCESSING

AND

PATTERN SEARCHING

UNIT V

PVPSIT (Autonomous)

Problem Solving Techniques

3 of 20

  • It has long been recognized that computers are well equipped to perform other than numerical computations.
  • Computers can process textual information too.
  • Computational strategies and algorithms used to deal with numerical data are different from textual data.
  • The basic unit is now a character rather than a number.
  • Our major concerns in processing text are centered around either manipulation and movement of characters or searching for patterns or words.
  • In this Chapter we are going to discuss:

1) KEYWORD SEARCH IN TEXT

2) TEXT LINE EDITING

3) LINEAR PATTERN SEARCH

4) SUBLINEAR PATTERN SEARCH

B. Vinay Kumar

2025-26

Introduction

PVPSIT (Autonomous)

Problem Solving Techniques

4 of 20

  • Problem
  • Count the number of times a particular word occurs in a given text.
  • Algorithm development
  • The text searching problem and related variations of it are commonly encountered in computing.
  • Example: whether or not the word SENTENCE occurs in the text:

THIS IS A SENTENCE OF TEXT

  • To solve this we will need to compare in some way the characters in the word with the characters in the text.

B. Vinay Kumar

2025-26

Algorithm 1

KEYWORD SEARCH IN TEXT

PVPSIT (Autonomous)

Problem Solving Techniques

5 of 20

  • To examine text in a computer we are limited to examining it one character at a time.
  • A word is a consecutive sequence of alphabetic characters.
  • A word in a text may be preceded and followed by one or more non-alphabetic characters (e.g. space(s), punctuation marks, end-of-line characters, end-of-file characters).
  • A text is a sequence of words separated by non-alphabetic characters.
  • Two words can be said to match when they have same characters from first to last to the length of both the word sought and the text word.

B. Vinay Kumar

2025-26

Algorithm 1- KEYWORD SEARCH IN TEXT

PVPSIT (Autonomous)

Problem Solving Techniques

6 of 20

B. Vinay Kumar

2025-26

Algorithm 1- KEYWORD SEARCH IN TEXT

PVPSIT (Autonomous)

Problem Solving Techniques

7 of 20

  • A mismatch between two words mean that the two words are not having same characters or in other words the number of character-for-character matches is less than the length of the two words.
  • For example:
  • The word SENT matches the word SENTENCE but we cannot say that there is a word match.
  • For example, we have

B. Vinay Kumar

2025-26

Algorithm 1- KEYWORD SEARCH IN TEXT

PVPSIT (Autonomous)

Problem Solving Techniques

8 of 20

This algorithm is essentially a String Searching process. It acts like a magnifying glass moving across a sentence, checking if the letters underneath match your target word one by one.

Let’s visualize this using the word "INK" as our target pattern within the text "PINK INK."

Visualizing the Execution

Imagine your target word INK has a pointer i that tracks which letter you are currently looking for.

Scenario 1: The "Partial Match" (The word "PINK")

  1. Read 'P': Does it match 'I' (Word[1])? No. (Mismatch situation: reset i to 1).
  2. Read 'I': Does it match 'I'? Yes! (Extend match: i moves to 2).
  3. Read 'N': Does it match 'N' (Word[2])? Yes! (Extend match: i moves to 3).
  4. Read 'K': Does it match 'K' (Word[3])? Yes!
    • Pattern Match Found!
    • Word Match Check: The algorithm looks at the character after the 'K'. It’s a space. It looks at the character before the 'I'. It's a 'P'.
    • Result: Since 'P' is a letter, "INK" is part of "PINK." It is not a standalone word match.

Algorithm 1- KEYWORD SEARCH IN TEXT

PVPSIT (Autonomous)

Problem Solving Techniques

9 of 20

Scenario 2: The "Total Match" (The word "INK")

  1. Read ' ' (Space): No match.
  2. Read 'I': Match! i = 2.
  3. Read 'N': Match! i = 3.
  4. Read 'K': Match!
    • Pattern Match Found!
    • Word Match Check: The character before was a space, and the character after is a period (.).
    • Result: Success! This is a Word Match.

Algorithm 1- KEYWORD SEARCH IN TEXT

PVPSIT (Autonomous)

Problem Solving Techniques

10 of 20

B. Vinay Kumar

2025-26

Algorithm 1- KEYWORD SEARCH IN TEXT

PVPSIT (Autonomous)

Problem Solving Techniques

11 of 20

  • In reading the text we must take into account end-of-lines.
  • In general, words are preceded by and followed by non-alphabetic characters.

  • The very first word is not preceded by a non-alphabetic character.

B. Vinay Kumar

2025-26

Algorithm 1- KEYWORD SEARCH IN TEXT

PVPSIT (Autonomous)

Problem Solving Techniques

12 of 20

2025-26

Match corresponds to a full word (word boundaries) rather than a substring

  • Post-Match Check: Identifying if the character following a matched pattern is alphabetic is straightforward (e.g., using a set test).
  • Pre-Match Challenge: Identifying the character preceding the pattern is more difficult, requiring a way to "remember" or save that character.
  • Saving State on Mismatch: When a "mismatch" occurs between the text character and the pattern, that specific text character should be saved. This saved character helps determine if a valid word boundary exists.
  • Constant Saved State: The "saved" character is not updated during a successful match, only during a mismatch, ensuring it holds the necessary context of what preceded the word.

Application

This technique is part of a "Problem Solving Approach in Computing" that involves establishing a search word, maintaining a match count, and managing pointers for a word array to navigate through text, particularly when checking for words that are preceded and followed by non-alphabetic characters.

PVPSIT (Autonomous)

Problem Solving Techniques

13 of 20

B. Vinay Kumar

2025-26

Steps Upon Mismatch

When a mismatch occurs, the following steps must be taken to reset the search:

  1. Save current text character as pre: Update the variable pre with the most recent text character.
  2. Reset word array pointer: Move the pointer i back to the first character of the word array.

Initializations

To handle edge cases (such as the first word):

  • Initialize pre: Set the pre variable to a non-alphabetic character before beginning, ensuring the first word is handled correctly even if it isn't preceded by one.
  • Initialize pointer i: Set the word array pointer to the first position initially.

Steps Upon Word-Match

When a full word-match is established, the process is similar to a mismatch, with added tracking:

  1. Increment match count by one.
  2. Set pre to the most recent character read (the character that follows the matched word, which might act as the precursor to the next word).
  3. Reset pointer i to the first position of the word array.

PVPSIT (Autonomous)

Problem Solving Techniques

14 of 20

  • i:=1; wc:=0; n := wlength; m := textlength
  • while (i <= m-n+1) do

j :=1 //search word pointer

while (j <=n and i <= m-n+1) do

if (T[i] = w[j]) then

i := i+1

j := j+1

else

i := i+1

j := 1

if (j>n and T[i] = non-alphabet and T[i-j] = non-alphabet) then

wc:= wc+1

i := i+1

    • writeout (wc)

B. Vinay Kumar

2025-26

Pseudocode

PVPSIT (Autonomous)

Problem Solving Techniques

15 of 20

  • Assume : Text Array T and Word Array W are already Established.
  • Step 0: Start
  • Step 1: Initialize Text Index i:=1, nmatches wc:=0, Word length n := wlength, Text Length m := textlength
  • Step 2: Repeat Steps 2.1 and 2.2 till (i <= (m-n+1))

Step 2.1: Initialize word index j :=1

Step 2.2: Repeat Step 2.2.1 till ((j <=n) and (i <= (m-n+1)))

Step 2.2.1: if (T[i] = w[j]) then

i := i+1

j := j+1

else

i := i+1

j := 1

Step 2.3: if (j>m and T[i] = non-alphabet and T[i-j] = non-alphabet) then

wc:= wc+1; i := i+1

    • Step 3: Display nmatches (wc)

B. Vinay Kumar

2025-26

Algorithm

PVPSIT (Autonomous)

Problem Solving Techniques

16 of 20

B. Vinay Kumar

2025-26

PVPSIT (Autonomous)

Problem Solving Techniques

17 of 20

  • Applications:
  • Limited text searching.

B. Vinay Kumar

2025-26

PVPSIT (Autonomous)

Problem Solving Techniques

18 of 20

  • 6.3.1 Modify this algorithm so that it will terminate on finding the first complete word-match.
  • Algorithm description
  • Establish the word and wordlength wlength of the search-word.
  • Initialize the match-count nmatches, set pointer for word array i to 1.
  • While not at end-of-file do

(a) While not end-of-line do

(a.1) Read next character;

(a.2) If current text character chr matches ith character in word then

(2.a) Extend partial match i by 1,

B. Vinay Kumar

2025-26

Supplementary problems

PVPSIT (Autonomous)

Problem Solving Techniques

19 of 20

(2.b) If a word-pattern match then

(b.1) Read next character post

(b.2) If preceding and following character not alphabetic then

(2.a) Update match count nmatches

(2.b) break

else

(2’.a) Save current text character as preceding character for match,

(2’.b) Reset word array pointer I to first position;

(b) read past end-of-line

B. Vinay Kumar

2025-26

PVPSIT (Autonomous)

Problem Solving Techniques

20 of 20

  • Pseudo-code
  • i:=1; j:=1;wc:=0; n := wlength

while (T[i] != EOF) do

While (j<=n) do

if T[i] == w[j] then

i:= i+1

j:= j+1

else

j:=1 //search word pointer

i:=i+1 // text pointer

while (T[i]!= non-alphabet) do

i:= i+1

i:= i+1

j:=j-1

if j==n && T[i] == non-alphabet then

wc:= wc+1

break

B. Vinay Kumar

2025-26

PVPSIT (Autonomous)

Problem Solving Techniques