Natural Language Processing
UNIT – I: Introduction
INTRODUCTION
Phases of NLP
Lexical Analysis
The first phase is lexical analysis/morphological processing. In this phase, the sentences, paragraphs are broken into tokens.
• These tokens are the smallest unit of text. It scans the entire source text and divides it into meaningful lexemes.
• For example, The sentence “He goes to college.” is
divided into [ ‘He’ , ‘goes’ , ‘to’ , ‘college’, ‘.’] .
• There are five tokens in the sentence. A paragraph may also be divided into sentences.
Syntactic Analysis/Parsing
The second phase is Syntactic analysis. In this phase, the sentence is checked whether it is well formed or not.
• The word arrangement is studied and a syntactic relationship is found between them. It is checked for word arrangements and grammar.
• For example, the sentence “Delhi goes to him” is rejected by the syntactic parser.
Semantic Analysis
The third phase is Semantic Analysis. In this phase, the sentence is checked for the literal meaning of each word and their arrangement together.
• For example, The sentence “I ate hot ice cream” will get rejected by the semantic analyzer because it doesn’t make sense.
• E.g.. “colorless green idea.” This would be rejected by the Symantec analysis as colorless Here; green doesn’t make any sense.
Discourse Integration
Pragmatic Analysis
NLP Implementation
Below, given are popular methods used for Natural Learning Process:
How to Perform NLP?
– Segmentation
– Tokenizing
– Removing Stop Words
– Stemming
– Lemmatization
– Part of Speech Tagging
– Named Entity Tagging
Segmentation
Tokenizing
Removing Stop Words
Stemming
Lemmatization
Part of Speech Tagging
Named Entity Tagging
Applications of NLP
1.1 Knowledge in Speech and Language Processing
HAL, the pod bay door is open.
HAL, is the pod bay door open?
I’m I do, sorry that afraid Dave I’m can’t
(Dave , I’m sorry I’m afraid I can’t do that.)
1.2 Ambiguity
1.3 Models and Algorithms
Regular Exressions
Use of Regular Expression in NLP
Substitutions
ELIZA substitutions using chat groups
Minimum Edit Diastance