1 of 52

Natural Language Processing

By

S.V.V.D.Jagadeesh

Sr. Assistant Professor

Dept of Artificial Intelligence & Data Science

LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING

2 of 52

At the end of this unit, Student will be able to:

  • CO5: Compare the use of different statistical approaches for different types of NLP applications.

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Session Outcomes

LBRCE

NLP

3 of 52

  • Discourse segmentation is the process of dividing a text into coherent units called discourse segments.
  • These segments can be:
  • sentences
  • clauses
  • topic-based units
  • Each segment represents a self-contained unit of meaning within discourse.
  • Discourse segmentation is fundamental in NLP because it enables systems to understand structure and flow of text.

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Discourse Segmentation

LBRCE

NLP

4 of 52

  • 1. Improves Text Understanding
  • Breaking text into segments helps identify:
  • topics
  • relationships
  • meaning boundaries
  • 2. Supports Downstream NLP Tasks
  • Machine Translation
  • Summarization
  • Question Answering
  • Dialogue Systems

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Why Discourse Segmentation is Important?

LBRCE

NLP

5 of 52

  • 3. Helps in Coherence Modeling
  • Segments help determine:
  • logical connections
  • discourse relations
  • Example
  • John went to the bank. He deposited money. Then he left.
  • Segments:
  • Going to bank
  • Depositing money
  • Leaving

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Why Discourse Segmentation is Important?

LBRCE

NLP

6 of 52

  • Sentence-Level Segmentation
  • Clause-Level Segmentation
  • Topic-Based Segmentation

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Types of Discourse Segmentation

LBRCE

NLP

7 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Sentence-Level Segmentation

LBRCE

NLP

8 of 52

  • Rule Based Method
  • Probabilistic Method (Naive Bayes)
  • ML Based Method

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Methods for Sentence-Level Segmentation

LBRCE

NLP

9 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Rule Based Method

LBRCE

NLP

10 of 52

  • .Step-by-Step Example
  • Input
  • Dr. Smith went to the bank. He deposited money.
  • Step 1: Tokenize
  • [Dr., Smith, went, to, the, bank., He, deposited, money.]
  • Step 2: Identify Punctuation
  • Candidates:
  • Dr.
  • bank.
  • money.

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Rule Based Method

LBRCE

NLP

11 of 52

  • Step 3: Apply Rules
  • Token: Dr. -> ends with "." ✔
  • but abbreviation → NOT boundary
  • Token: bank. ->ends with "." ✔
  • next word “He” is capital ✔ → Boundary ✔
  • Token: money.
  • end of sentence ✔ → Boundary ✔
  • Final Segmentation
  • Dr. Smith went to the bank.�He deposited money.

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Rule Based Method

LBRCE

NLP

12 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Probabilistic Method

LBRCE

NLP

13 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Probabilistic Method

LBRCE

NLP

14 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Probabilistic Method

LBRCE

NLP

15 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Machine Learning Approach

LBRCE

NLP

16 of 52

  • Model predicts:
  • boundary = 1
  • Complete Worked Example
  • Input Text
  • Dr. John went to the bank. He deposited money. It was closed.
  • Step 1: Identify Candidates
  • Dr.
  • bank.
  • money.
  • closed.

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Machine Learning Approach

LBRCE

NLP

17 of 52

  • Step 2: Apply Rules
  • Dr.→ abbreviation → NOT boundary
  • bank.→ next word capital → boundary
  • money.→ next word capital → boundary
  • closed.→ end → boundary
  • Final Output
  • Dr. John went to the bank.�He deposited money.�It was closed.

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Machine Learning Approach

LBRCE

NLP

18 of 52

  • Clause-level segmentation is the process of dividing a sentence into clauses, where each clause contains:
  • a predicate (verb)
  • optionally a subject and objects
  • A clause represents a minimal unit of meaning within a sentence.
  • Types of Clauses
  • Independent Clause
  • Can stand alone as a sentence
  • Example:
  • He went to the bank.

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Clause-Level Segmentation

LBRCE

NLP

19 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Clause-Level Segmentation

LBRCE

NLP

20 of 52

  • Clause boundaries are identified using:
  • Conjunctions
  • and, but, because, although, if
  • Relative Pronouns
  • who, which, that
  • Punctuation
  • comma (,), semicolon (;)
  • Verbs
  • Each clause typically has at least one verb

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Linguistic Cues for Clause-Level Segmentation

LBRCE

NLP

21 of 52

  • Rule Based Method
  • Parse Tree Method
  • Probabilistic Method (Naive Bayes)
  • ML Based Method

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Methods for Clause-level Segmentation

LBRCE

NLP

22 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Rule Based Method

LBRCE

NLP

23 of 52

  • Step-by-Step Example
  • Input Sentence
  • Although he was tired, he went to the bank and deposited money.
  • Step 1: Tokenization
  • [Although, he, was, tired, ,, he, went, to, the, bank, and, deposited, money]
  • Step 2: Identify Clause Markers
  • Although → subordinate clause
  • comma → separator
  • and → coordinating conjunction

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Rule Based Method

LBRCE

NLP

24 of 52

  • Step 3: Apply Rules
  • Clause 1
  • Although he was tired
  • Reason:
  • starts with conjunction
  • contains verb “was”
  • Clause 2
  • he went to the bank
  • Clause 3
  • and deposited money

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Rule Based Method

LBRCE

NLP

25 of 52

  • Final Segmentation
  • [Although he was tired] �[he went to the bank] �[and deposited money]

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Rule Based Method

LBRCE

NLP

26 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Parse-Tree Based Method

LBRCE

NLP

27 of 52

  • Parse Structure
  • S�├── NP (He)�└── VP� ├── V (said)� └── S� ├── that� ├── NP (she)� └── VP (left)
  • Segmentation
  • He said
  • that she left

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Parse-Tree Based Method

LBRCE

NLP

28 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Probabilistic Method

LBRCE

NLP

29 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Probabilistic Method

LBRCE

NLP

30 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Machine Learning Method

LBRCE

NLP

31 of 52

  • Example
  • Input
  • If he studies hard, he will pass and get a job.
  • Step 1: Identify Markers
  • If → clause start
  • comma → boundary
  • and → clause link

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Machine Learning Method

LBRCE

NLP

32 of 52

  • Step 2: Segment
  • Clause 1
  • If he studies hard
  • Clause 2
  • he will pass
  • Clause 3
  • and get a job
  • Final Output
  • [If he studies hard] �[he will pass] �[and get a job]

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Machine Learning Method

LBRCE

NLP

33 of 52

  • Topic-based segmentation divides a text into segments such that:
  • Each segment contains sentences discussing the same topic, and boundaries occur where the topic changes.
  • Core Idea
  • Represent each sentence as a vector of words (or features)
  • Measure similarity between adjacent sentences or blocks
  • Detect drops in similarity → topic boundary

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Topics Based Segmentation

LBRCE

NLP

34 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Topics Based Segmentation

LBRCE

NLP

35 of 52

  • Example
  • Input Text
  • S1: The bank offers loans and savings accounts. �S2: It provides credit cards and financial services. �S3: Customers can open accounts easily. �S4: The river bank was flooded after heavy rain. �S5: Water levels increased rapidly near the bank.
  • Step 1: Build Vocabulary
  • bank, loans, savings, accounts, credit, financial, customers, river, water, flooded

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Topics Based Segmentation

LBRCE

NLP

36 of 52

Step 2: Convert Sentences to vectors

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Topics Based Segmentation

Sentence

bank

loans

savings

accounts

credit

financial

customers

river

water

flooded

S1

1

1

1

1

0

0

0

0

0

0

S2

0

0

0

0

1

1

0

0

0

0

S3

0

0

0

1

0

0

1

0

0

0

S4

1

0

0

0

0

0

0

1

0

1

S5

1

0

0

0

0

0

0

0

1

0

LBRCE

NLP

37 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Topics Based Segmentation

LBRCE

NLP

38 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Topics Based Segmentation

LBRCE

NLP

39 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Topics Based Segmentation

Pair

Similarity

Boundary?

S1–S2

0

YES

S2–S3

0

YES

S3–S4

0

YES

S4–S5

0.41

NO

LBRCE

NLP

40 of 52

  • Final Segmentation
  • [The bank offers loans and savings accounts.]�[It provides credit cards and financial services.]�[Customers can open accounts easily.]��--- Topic Shift ---��[The river bank was flooded after heavy rain.�Water levels increased rapidly near the bank.]

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Topics Based Segmentation

LBRCE

NLP

41 of 52

  • TextTiling is a topic-based discourse segmentation algorithm that:
  • Divides text into coherent segments by detecting topic shifts using lexical similarity between blocks of text.
  • Core Idea
  • Divide text into small blocks (token sequences)
  • Compute similarity between adjacent blocks
  • Identify valleys (drops) in similarity → boundaries

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Text Tiling Method

LBRCE

NLP

42 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Text Tiling Method

LBRCE

NLP

43 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Text Tiling Method

LBRCE

NLP

44 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Text Tiling Method

LBRCE

NLP

45 of 52

  • Example
  • Input Text
  • S1: The bank offers loans and savings accounts. �S2: It provides credit cards and financial services. �S3: Customers can open accounts easily. �S4: The river bank was flooded after heavy rain. �S5: Water levels increased near the river. �S6: The area was evacuated due to flooding.
  • Step 1: Build Vocabulary
  • bank, loans, savings, accounts, credit, financial, customers,�river, water, flooded, rain, evacuated

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Text Tiling Method

LBRCE

NLP

46 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Text Tiling Method

LBRCE

NLP

47 of 52

  • Step 4: Convert Blocks into Vectors
  • B1 (Finance)
  • bank, loans, savings, accounts, credit, financial, customers
  • B2 (Mixed)
  • accounts, customers, river, flooded
  • B3 (River)
  • river, water, flooded, rain, evacuated

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Text Tiling Method

LBRCE

NLP

48 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Text Tiling Method

LBRCE

NLP

49 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Text Tiling Method

LBRCE

NLP

50 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Text Tiling Method

LBRCE

NLP

51 of 52

  •  

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Text Tiling Method

Position

Depth

Boundary

B1–B2

0.14

YES

B2–B3

small

NO

LBRCE

NLP

52 of 52

  • Final Segmentation
  • [The bank offers loans and savings accounts.�It provides credit cards and financial services.�Customers can open accounts easily.]��--- Topic Shift ---��[The river bank was flooded after heavy rain.�Water levels increased near the river.�The area was evacuated due to flooding.]

S.V.V.D.Jagadeesh

Sunday, March 29, 2026

Text Tiling Method

LBRCE

NLP