NLP in 2023
Grigory Sapunov
DataFest Yerevan / 2023.09.09
gs@inten.to
0. Foundation Models
In recent years, a new successful paradigm for building AI systems has emerged: Train one model on a huge amount of data and adapt it to many applications. We call such a model a foundation model.
Foundation models (e.g., GPT-3) have demonstrated impressive behavior, but can fail unexpectedly, harbor biases, and are poorly understood. Nonetheless, they are being deployed at scale.
The Center for Research on Foundation Models (CRFM) is an interdisciplinary initiative born out of the Stanford Institute for Human-Centered Artificial Intelligence (HAI) that aims to make fundamental advances in the study, development, and deployment of foundation models.
Foundation Models
Foundation Models
1. Prompt Engineering is a job
Prompt Engineer / JD
Prompting is tricky
Chain-of-Thought
“Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, https://arxiv.org/abs/2201.11903
Using tools
“Toolformer: Language Models Can Teach Themselves to Use Tools”, https://arxiv.org/abs/2302.04761
Jailbreaks
Prompting resembles good-old (another) NLP
2. Commercial LLMs
> We no longer test LLMs on special ML datasets, we test them on Human exams…�(a lot of anthropomorphizing �is going on…)
GPT-4
Unknown characteristics, only rumors:
PaLM 2
Unknown characteristics, only rumors:
Anthropic Claude 2
Inflection AI
Other commercial LLMs
3. Open-source LLMs
> Instruction & Chat fine-tunings are very popular
LLaMA (7B-65B)
“LLaMA: Open and Efficient Foundation Language Models”
https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
LLaMA+: Alpaca, Vicuna, Guanaco, …
LLaMA 2 (7B-70B)
“Llama 2: Open Foundation and Fine-Tuned Chat Models”
https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/
Falcon (40B, 180B)
Many more other OS LLMs
> There’s still a gap between commercial and open-source models
4. RL is widely used
RLHF (RL from Human Feedback)
“Training language models to follow instructions with human feedback”, https://arxiv.org/abs/2203.02155
RLHF (RL from Human Feedback)
“Training language models to follow instructions with human feedback”, https://arxiv.org/abs/2203.02155
RLAIF (RL from AI Feedback)
“Constitutional AI: Harmlessness from AI Feedback”, https://arxiv.org/abs/2212.08073
RLAIF & Constitutional AI
“Constitutional AI: Harmlessness from AI Feedback”, https://arxiv.org/abs/2212.08073
5. LLMs go to Search
Bing + OpenAI
Google + Bard
You.com
IR-augmented LLMs is not a new thing
“LaMDA: Language Models for Dialog Applications”, https://arxiv.org/abs/2201.08239
6. Software ecosystem
Vector Databases
LLM Programs
“Large Language Model Programs”, https://arxiv.org/abs/2305.05364
Tree-of-Thought
“Large Language Model Guided Tree-of-Thought”, https://arxiv.org/abs/2305.08291
Graph of Thoughts
“Graph of Thoughts: Solving Elaborate Problems with Large Language Models”, https://arxiv.org/abs/2308.09687
Auto-GPT
7. Code
Code LLaMA
“Introducing Code Llama, an AI Tool for Coding”
WizardCoder
Other LLMs for code
8. Context on Multilinguality
Multilinguality
Many models are now multilingual by default
Sometimes that happen without any special efforts:
“OPT was not intentionally trained to be multilingual, but we found anecdotally it has limited success with simple translations in German, Spanish, French, and Chinese”
Curse of Multilinguality
“The experiments expose a trade-off as we scale the number of languages for a fixed model capacity: more languages leads to better cross-lingual performance on low-resource languages up until a point, after which the overall performance on monolingual and cross-lingual benchmarks degrades. We refer to this tradeoff as the curse of multilinguality, and show that it can be alleviated by simply increasing model capacity.”
“For a fixed sized model, the per-language capacity decreases as we increase the number of languages. While low-resource language performance can be improved by adding similar higher-resource languages during pretraining, the overall downstream performance suffers from this capacity dilution. Positive transfer and capacity dilution have to be traded off against each other.”
“Unsupervised Cross-lingual Representation Learning at Scale”�https://aclanthology.org/2020.acl-main.747/
[Meta] WMT 2021
For the first time, a single multilingual model has outperformed the best specially trained bilingual models across 10 out of 14 language pairs to win WMT (news translation task).
[Meta] NLLB-200
[Google] 1,000 Languages Initiative
“That’s why today we’re announcing the 1,000 Languages Initiative, an ambitious commitment to build an AI model that will support the 1,000 most spoken languages, bringing greater inclusion to billions of people in marginalized communities all around the world.
And our most advanced language models are multimodal – meaning they’re capable of unlocking information across these many different formats. With these seismic shifts come new opportunities.
As part of our this initiative and our focus on multimodality, we’ve developed a Universal Speech Model — or USM — that’s trained on over 400 languages, making it the largest language coverage seen in a speech model to date.”
8a. Machine Translation (MT)
> MT is disrupted by LLMs now
Prompting MT (our prediction in 2021)
> Some MT systems might be replaced by LLMs soon
> Transcreation replaces Translation
> But MT and LLMs may still need each other
> MT helps LLM
Datasets are still dominated by English
English is still the major language in many datasets used to train LLMs.
So, not surprisingly, models solve tasks using English better that with other languages, especially low-resource languages.
LLMs may work better with translation
“Do Multilingual Language Models Think Better in English?“ https://arxiv.org/abs/2308.01223
All languages are NOT (tokenized) equal
Because of tokenization issues, the same text in English may require less tokens than, say, for Korean. So, the less price and the longer context!
English has the shortest median token len
> LLM helps MT
Improving MT results with LLMs
There is a huge area of Source Quality Improvement and Automated Post-editing to improve MT results:
9. Multimodality is a norm
GPT-4 with visual inputs
GPT-4 was trained in a multimodal setting (text+image) and is able to take images on its input
Though, this feature is not available in the API yet.
PaLM-E (Google)
“PaLM-E: An Embodied Multimodal Language Model”
Kosmos-1 (& 2): a multimodal LLM (MLLM)
“Language Is Not All You Need: Aligning Perception with Language Models”
OSS Flamingo: OpenFlamingo (9B)
OSS Flamingo: IDEFICS (9B, 80B)
[Google] Universal Speech Model (USM)
[Meta] SeamlessM4T
Risks: Voice cloning
Dynalang (DreamerV3 + LLM)
“Learning to Model the World with Language”, https://arxiv.org/abs/2308.01399
DeepMind Gato 2 is on the horizon?
“A Generalist Agent”, https://arxiv.org/abs/2205.06175
Image Generation
They’re also multimodal models:
[Runway] Generating video: Gen1 & Gen2
Modes:
10. Safety, Legal & Ethical issues
Misuse & Malicious use
Legal risks
Ethical problems
NPU Uprising: American Military's AI Cluster Goes Rogue, Authorities Scramble to Regain Control
> Push towards transparency & responsibility.�But different approaches.
“Pause Giant AI Experiments: An Open Letter”
“Contemporary AI systems are now becoming human-competitive at general tasks, and we must ask ourselves: Should we let machines flood our information channels with propaganda and untruth? Should we automate away all the jobs, including the fulfilling ones? Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization? Such decisions must not be delegated to unelected tech leaders. Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable.“ … “Therefore, we call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4. “
“Time is running out: Demand responsible AI development!”
“Therefore, we call for:
“Keep up the progress tempo”
“Join us in our urgent mission to democratize AI research by establishing an international, publicly funded supercomputing facility equipped with 100,000 state-of-the-art AI accelerators to train open source foundation models. “
…
“the open-source nature of this project will promote safety and security research, allowing potential risks to be identified and addressed more rapidly and transparently by the academic community and open-source enthusiasts. This is a vital step in ensuring the safety and reliability of AI technologies as they become increasingly integrated into our lives.”
“Include Consciousness Research”
“To understand whether AI systems are, or can become, conscious, tools are needed that can be applied to artificial systems. In particular, science needs to further develop formal and mathematical tools to model consciousness and its relationship to physical systems. In conjunction with empirical and experimental methods to measure consciousness, questions of AI consciousness must be tackled.”
More papers and discussions �in our Telegram channel:�https://t.me/gonzo_ML
BTW, the book is almost ready!
Thanks!