Introduction to Speech & Natural Language Processing
Lecture 5
Lexical Processing Applications
Krishnendu Ghosh
Authorship Attribution / Stylometry
Example: Distinguishing Shakespeare from Marlowe using lexical statistics.
Authorship Attribution / Stylometry
Language Identification
Example: Distinguishing Kannada vs. Hindi vs. English tweets.
Language Identification
Spell Checking and Correction
Example: Correcting “recieve” → “receive” using minimum edit distance.
Spell Checking and Correction
Text Normalization for SE
Example: Searching “running shoes” returns results for “run” and “runners”.
Keyword Extraction and Indexing
Example: Auto-generating tags for research papers or news articles.
Keyword Extraction and Indexing
Sentiment Analysis (Lexicon-Based)
Example: Counting words like “happy”, “great”, “terrible” to score movie reviews.
Sentiment Analysis (Lexicon-Based)
IR and Search Optimization
Example: Google search uses normalized tokens for query expansion.
Plagiarism Detection
Example: Detecting copied or rephrased content using token similarity.
Plagiarism Detection
Speech Transcription Post-Processing
Example: Converting “three point five percent” → “3.5%”.
Text Simplification
Example: Creating simplified text for children or second-language learners.
Text Simplification