1 of 76

Model 2

Word Level Analysis & Syntactic Analysis

2 of 76

Content:

Word Level Analysis: Regular Expressions, Finite-State Automata, Morphological Parsing, Spelling Error Detection and Correction, Words and Word Classes, Part-of Speech Tagging.

Syntactic Analysis: Context-Free Grammar, Constituency, Top-down and Bottom-up Parsing, CYK Parsing.

Textbook 1: Ch. 3, Ch. 4

3 of 76

Chapter 3- Word Level Analysis

  • Regular Expressions

Regular expressions (RegEx) are sequences of characters used to find or replace patterns within text. They are essential tools in Natural Language Processing (NLP) for tasks such as data pre-processing, pattern matching, text feature engineering, web scraping, and data extraction.

4 of 76

5 of 76

6 of 76

7 of 76

8 of 76

9 of 76

10 of 76

11 of 76

12 of 76

13 of 76

14 of 76

15 of 76

16 of 76

3.3 FINET AUTOMATA

17 of 76

18 of 76

19 of 76

20 of 76

21 of 76

22 of 76

23 of 76

24 of 76

25 of 76

26 of 76

27 of 76

28 of 76

29 of 76

3.4 MORPHOLOGICAL PARSING

What is Morphological Parsing?

Morphological parsing is the process of breaking down a word into its morphemes, which are the smallest units of meaning. This process helps in understanding the word's structure and its role in a sentence.

Why is it Important?

Morphological parsing is crucial for various NLP tasks, such as:

  • Part-of-Speech Tagging: Identifying the grammatical category of words.
  • Named Entity Recognition: Recognizing proper names, dates, and other entities.
  • Machine Translation: Translating languages with different morphological structures.
  • Text Normalization: Converting text into a standard format (e.g., converting “running” to “run”).

30 of 76

31 of 76

32 of 76

33 of 76

34 of 76

35 of 76

36 of 76

37 of 76

3.5 Spelling Error Detection and Correction

What is Spelling and Error Detection?

Spelling and error detection involves identifying and correcting mistakes in written text. These errors can include:

  • Typographical Errors: Mistakes in typing, e.g., “teh” instead of “the.”
  • Contextual Errors: Correctly spelled words used in the wrong context, e.g., “their” instead of “there.”

Why is it Important?

Accurate text is essential for effective communication. Spelling and error detection is crucial for:

  • Improving Readability: Ensuring that text is easy to read and understand.
  • Enhancing Professionalism: Correct text reflects positively on the writer or organization.
  • Enabling Other NLP Tasks: Many NLP tasks, such as machine translation and sentiment analysis, rely on error-free text.

38 of 76

39 of 76

40 of 76

41 of 76

42 of 76

43 of 76

44 of 76

45 of 76

46 of 76

47 of 76

48 of 76

3.6 WORDS AND WORD CLASSES

49 of 76

50 of 76

3.7 Part ofSpeech Tagging

Part-of-Speech (POS) tagging is a fundamental NLP task that involves assigning a part of speech to each word in a sentence. Here’s an overview:

What is POS Tagging?

POS tagging is the process of identifying the grammatical category of each word in a given text, such as nouns, verbs, adjectives, etc.

51 of 76

Why is it Important?

  • POS tagging is crucial for various NLP applications:
  • Syntax Analysis: Helps in understanding the grammatical structure of sentences.
  • Information Extraction: Facilitates identifying key elements in the text.
  • Sentiment Analysis: Assists in understanding the sentiment of a text by identifying descriptive words.
  • Machine Translation: Helps in translating text accurately by understanding the role of each word.

How Does it Work?

  • POS tagging can be performed using several techniques:
  • Rule-Based Methods: Use a set of hand-crafted rules to determine the POS tags.
  • Statistical Methods: Use probabilistic models like Hidden Markov Models (HMM) to predict the POS tags based on the likelihood of word sequences.
  • Machine Learning: Employ supervised learning techniques, where models like decision trees, support vector machines, or neural networks are trained on annotated corpora.

52 of 76

Example

  • Consider the sentence: "The quick brown fox jumps over the lazy dog."
  • The: Determiner (DT)
  • quick: Adjective (JJ)
  • brown: Adjective (JJ)
  • fox: Noun (NN)
  • jumps: Verb (VBZ)
  • over: Preposition (IN)
  • the: Determiner (DT)
  • lazy: Adjective (JJ)
  • dog: Noun (NN)

53 of 76

54 of 76

55 of 76

56 of 76

57 of 76

58 of 76

59 of 76

60 of 76

Chapter 4- Syntactic Analysis

4.2 CONTEXT FREE GRAMMAR(CFG)

61 of 76

62 of 76

63 of 76

64 of 76

PARSING

65 of 76

66 of 76

67 of 76

68 of 76

69 of 76

70 of 76

71 of 76

72 of 76

73 of 76

74 of 76

75 of 76

76 of 76