Multi-Modal Large Language Models
Sreyan Ghosh
1
About Me
Multi-Modal Large Language Models
Current: 2nd Year C.S. Ph.D. Student
Advisor: Dr. Dinesh Manocha and Dr. Ramani Duraiswami
Interests: Resource Efficient Deep Learning (applied to SLP)
Past -
2
https://sreyan88.github.io/
What are Large Language Models?
Multi-Modal Large Language Models
3
Large Language Models are just large neural networks (neural networks that can act as language models) trained on web-scale data.
(L)LMs come in various types.
Multi-Modal Large Language Models
4
Decoder-only LMs
Encoder-Decoder LMs
Encoder-Only LMs
And Sizes.
Multi-Modal Large Language Models
5
Common Architectures
Multi-Modal Large Language Models
6
How do they work?
Multi-Modal Large Language Models
7
ChatGPT released in November 2022
Multi-Modal Large Language Models
8
And achieved a lot!
Multi-Modal Large Language Models
9
And it kept getting better.
Multi-Modal Large Language Models
10
lmsys.org
Evolution of (Large) Language Models
Multi-Modal Large Language Models
11
GPT-4V was released in March 2023
Multi-Modal Large Language Models
12
What are Multi-Modal LLMs?
Multi-Modal Large Language Models
13
Multimodal language models are AI systems designed to understand, interpret, and generate information across different forms of data, such as text and images. These models leverage large datasets of annotated examples to learn associations between text and visual content, enabling them to perform tasks that require comprehension of both textual and visual information.
Timeline of MM-LLMs
Multi-Modal Large Language Models
14
Who gets it better?
Multi-Modal Large Language Models
15
The community has come a long way.
Multi-Modal Large Language Models
16
Multi-Modal Large Language Models
17
A generic architecture of MM-LLM
Multi-Modal Large Language Models
18
You look at the previous context + the other modality
With time, OS models kept getting better!
Multi-Modal Large Language Models
19
And applied to other domains!
Multi-Modal Large Language Models
20
Multi-Modal Large Language Models
21
Different Use-Cases of MM-LLMs
Multi-Modal Large Language Models
22
MM-LLMs for more modalities (Audio)
Multi-Modal Large Language Models
23
MM-LLMs for more modalities (Video)
Multi-Modal Large Language Models
24
Other modalities are catching up!
Multi-Modal Large Language Models
25
Graph LLMs
Time-series LLMs
The Dark Side.
Multi-Modal Large Language Models
26
(MM)LLMs Hallucinate Often.
Multi-Modal Large Language Models
27
Needle in the Haystack Problem.
Multi-Modal Large Language Models
28
Evaluation of LLMs is hard!
Multi-Modal Large Language Models
29
https://arxiv.org/pdf/2306.05685
Prior to 2022, NLP research was focused on discriminative tasks
a) Human evaluation is expensive.
b) Does GPT-4 know everything?
c) Non-determinism is always there.
d) What does GPT-4 use?
Holistic alignment is also hard!
Multi-Modal Large Language Models
30
Alignment to Human Preferences ~ safety alignment, factual alignment, engagement alignment …. and it goes on
Lack of good quality training data!
Multi-Modal Large Language Models
31
Getting harder for academia!
Conclusion
Multi-Modal Large Language Models
32