Introduction� To� Natural Language Processing
Sreyan Ghosh
Deep Learning Solutions Architect @ NVIDIA
Researcher @ MIDAS Labs, IIIT Delhi ; Speech Lab, IIT Madras
Phases of Innovation in Artificial Intelligence
The Various Domains of AI
NLP lies at the intersection of computational linguistics and machine learning.
Natural Language Processing (NLP)
Human language is special for several reasons. It is specifically constructed to convey the speaker/writer's meaning. It is a complex system, although little children can learn it pretty quickly.
1. Ambiguity
2. Scale
3. Sparsity
4. Variation
5. Expressivity
6. Unmodeled Variables
7. Unknown representations
Ambiguity at multiple levels:
Why is NLP Difficult?
Applications of Text and Speech Processing
Level Of Linguistic Knowledge
Question Answering
Text Classification
Text Summarization
Language Modelling
Machine Translation
Sequence Tagging
Data Augmentation
Information Extraction
NER
Abstractive
Extractive
Sentiment Classification
Toxicity Classification
Text Processing
Syntactic analysis, also referred to as syntax analysis or parsing, is the process of analyzing natural language with the rules of a formal grammar. Grammatical rules are applied to categories and groups of words, not individual words. Syntactic analysis basically assigns a semantic structure to text.
For example, a sentence includes a subject and a predicate where the subject is a noun phrase and the predicate is a verb phrase. Take a look at the following sentence: “The dog (noun phrase) went away (verb phrase).” Note how we can combine every noun phrase with a verb phrase. Again, it's important to reiterate that a sentence can be syntactically correct but not make sense.
Parsing refers to the formal analysis of a sentence by a computer into its constituents, which results in a parse tree showing their syntactic relation to one another in visual form, which can be used for further processing and understanding.
Semantic analysis is the process of understanding the meaning and interpretation of words, signs and sentence structure. This lets computers partly understand natural language the way humans do. I say partly because semantic analysis is one of the toughest parts of NLP and it's not fully solved yet.
Syntactic and Semantic Analysis
A Parse Tree
NLP Before Deep Learning
Count Vectorizer
https://www.kdnuggets.com/2020/01/decision-tree-algorithm-explained.html
Random Forests
TF-IDF
A Basic Neural Network
LSTM
Neural Networks, Deep Learning and NLP
Word Embeddings
Why are word embeddings important?
Example: In web search, if user searches for “Seattle motel”, we would like to match documents containing “Seattle hotel”.
But:
motel = [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
hotel = [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
These two vectors are orthogonal.
There is no natural notion of similarity for one-hot vectors!
Solution: Learn to encode similarity in vectors themselves.
Distributional semantics: A word’s meaning is given by the words that frequently appear close-by • “You shall know a word by the company it keeps” (J. R. Firth 1957: 11).
When a word w appears in a text, its context is the set of words that appear nearby (within a fixed-size window).Use the many contexts of w to build up a representation of w.
Representing Word Vectors by their Context
https://jalammar.github.io/illustrated-word2vec/�https://machinelearninginterview.com/topics/natural-language-processing/what-is-the-difference-between-word2vec-and-glove/
Continuous Bag of Words (CBOW) and Skip-gram are two ways Word2Vec is trained.
Word2Vec and GloVe; Two popular ways of training word embeddings
Word2Vec: Words closer in meaning in general domain English are closer to each other in the multi-dimensional vector space.
Language Models
RNN-based Language Model
Transformer-based Language Model
Language Models are used for various use-cases including Speech Recognition, Sentence Scoring, Novel Sequence Generation, and very recently word embedding generation
Attention: The attention-mechanism looks at an input sequence and decides at each step which other parts of the sequence are important. It sounds abstract but let me clarify with an easy example: When reading this text, you always focus on the word you read but at the same time your mind still holds the important keywords of the text in memory in order to provide context. Same goes for machine translation.
The Attention Mechanism
Summary of Popular Attention Mechanisms
https://lilianweng.github.io/posts/2018-06-24-attention/
Transformers and Modern Day NLP
Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
https://ai.stackexchange.com/questions/20075/why-does-the-transformer-do-better-than-rnn-and-lstm-in-long-range-context-depen
Visual Representation of the Transformer Model
Self Attention Mechanism
The Mantra: “you compare the ‘query’ with the ‘keys’ and get scores/weights for the ‘values.’ Each score/weight is in short the relevance between the ‘query’ and each ‘key’. And you reweight the ‘values’ with the scores/weights, and take the summation of the reweighted ‘values’.”
Visual Representation of how Query, Keys and Values Interact
https://data-science-blog.com/blog/2021/04/07/multi-head-attention-mechanism/
Self Attention Mechanism Continued….
Visual Representation of Scaled Dot-Product Attention
Visual Representation of Matrix Multiplication Operations in Scaled Dot-Product Attention
BERT – Bidirectional Encoder Representations from Transformers
BERT follows a two-step process; First Self-Supervised Learning using unlabeled data and next fine-tuning on task specific labelled data
Masked Language Modelling (MLM)
Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
You are currently looking at a slide. You can also slide down the stairs or play on a slide in a park. Word2Vec which learned only one “meaning” for a word. In this sense, Word2Vec produces static definitions (or embeddings which are just vectors representing the word in question), since they only have one meaning which doesn’t change depending on the context in which it’s used.
Question Answering
Speech Synthesis
Speech Enhancement
Speaker Diarization
Speech Recognition
Speaker Verification
Speech Classification
Trigger Word Detection
Speaker Identity Verification
Emotion Recognition
Keyword Recognition
Noise Reduction
Natural TTS
Spoken Language Processing
Traditional Speech Recognition
Visual Representation of HMM-GMM based Speech Recognition
The arrows on the HMM model represent phone transitions or links to observables. To model the audio features that we observe, we learn a GMM model from the training data.
End-to-End Speech Recognition
Visual Representation of DeepSpeech2 Model Architecture [1]
Visual Representation of Wav2Vec-2.0 Model Architecture [2] based on self-attention layers. Similar to BERT, Wav2Vec-2.0 also follows pre-training on unlabeled data using SSL followed by fine-tuning with labeled data regime.
[1] Hannun, Awni, et al. "Deep speech: Scaling up end-to-end speech recognition." arXiv preprint arXiv:1412.5567 (2014).
[2] Baevski, Alexei, et al. "wav2vec 2.0: A framework for self-supervised learning of speech representations." Advances in Neural Information Processing Systems 33 (2020): 12449-12460.
Text-based Language Models for Speech Recognition?
A Language Model scores the output transcript
Useful Resources
Mastering NLP
GET YOUR FUNDAMENTALS OF RIGHT.
LEARN A SINGLE MACHINE LEARNING AND DEEP LEARNING FRAMEWORK.
AI IS VAST, TRY TO MASTER A SINGLE DOMAIN.
READ MORE AND MORE PAPERS. TRY FOCUSING ON QUALITY OVER QUANTITY.
IDENTIFY A PROBLEM, READ A LOT OF LITERATURE AND PAST WORK RELATED TO IT.
https://research.com/conference-rankings/computer-science
Recent Trends in NLP Research
Self-Supervised Learning
ASR and NLU
Disfluency Detection
Emotion Recognition
Hate Speech Detection
Social Media Analysis
Thank You!!�Have A Great Semester Ahead
https://www.linkedin.com/in/sreyan-ghosh/
Word Embeddings