1 of 104

NLP in 2023

Grigory Sapunov

DataFest Yerevan / 2023.09.09

gs@inten.to

2 of 104

0. Foundation Models

3 of 104

In recent years, a new successful paradigm for building AI systems has emerged: Train one model on a huge amount of data and adapt it to many applications. We call such a model a foundation model.

Foundation models (e.g., GPT-3) have demonstrated impressive behavior, but can fail unexpectedly, harbor biases, and are poorly understood. Nonetheless, they are being deployed at scale.

The Center for Research on Foundation Models (CRFM) is an interdisciplinary initiative born out of the Stanford Institute for Human-Centered Artificial Intelligence (HAI) that aims to make fundamental advances in the study, development, and deployment of foundation models.

4 of 104

Foundation Models

5 of 104

Foundation Models

6 of 104

1. Prompt Engineering is a job

7 of 104

8 of 104

Prompt Engineer / JD

9 of 104

Prompting is tricky

10 of 104

Chain-of-Thought

“Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, https://arxiv.org/abs/2201.11903

11 of 104

Using tools

“Toolformer: Language Models Can Teach Themselves to Use Tools”, https://arxiv.org/abs/2302.04761

12 of 104

Jailbreaks

13 of 104

Prompting resembles good-old (another) NLP

14 of 104

2. Commercial LLMs

15 of 104

> We no longer test LLMs on special ML datasets, we test them on Human exams…�(a lot of anthropomorphizing �is going on…)

16 of 104

GPT-4

Unknown characteristics, only rumors:

  • Mixture Of Experts
  • ~1.8T parameters
  • but each forward step utilizes only ~280B parameters
  • trained on ~13T tokens
  • 32k version is a fine-tuned 8k version
  • Cost estimated ~$20-60M

17 of 104

PaLM 2

Unknown characteristics, only rumors:

  • 340B params
  • Trained on 3.6T tokens

18 of 104

Anthropic Claude 2

  • 100k token window
  • Focus on safety
  • +Claude Instant, �+Claude 1.3

19 of 104

Inflection AI

  • Worse than GPT-4 and PaLM 2, but it’s in a different compute class

20 of 104

Other commercial LLMs

21 of 104

3. Open-source LLMs

22 of 104

> Instruction & Chat fine-tunings are very popular

23 of 104

LLaMA (7B-65B)

“LLaMA: Open and Efficient Foundation Language Models”

https://ai.facebook.com/blog/large-language-model-llama-meta-ai/

24 of 104

LLaMA+: Alpaca, Vicuna, Guanaco, …

25 of 104

LLaMA 2 (7B-70B)

26 of 104

Falcon (40B, 180B)

27 of 104

Many more other OS LLMs

  • MPT
  • Dolly 2.0
  • Cerebras-GPT
  • Phoenix
  • Platypus
  • Pythia
  • BLOOM
  • OPT
  • StableLM
  • XGen

28 of 104

> There’s still a gap between commercial and open-source models

29 of 104

30 of 104

4. RL is widely used

31 of 104

RLHF (RL from Human Feedback)

“Training language models to follow instructions with human feedback”, https://arxiv.org/abs/2203.02155

32 of 104

RLHF (RL from Human Feedback)

“Training language models to follow instructions with human feedback”, https://arxiv.org/abs/2203.02155

33 of 104

RLAIF (RL from AI Feedback)

“Constitutional AI: Harmlessness from AI Feedback”, https://arxiv.org/abs/2212.08073

34 of 104

RLAIF & Constitutional AI

“Constitutional AI: Harmlessness from AI Feedback”, https://arxiv.org/abs/2212.08073

35 of 104

5. LLMs go to Search

36 of 104

Bing + OpenAI

37 of 104

Google + Bard

38 of 104

You.com

39 of 104

IR-augmented LLMs is not a new thing

“LaMDA: Language Models for Dialog Applications”, https://arxiv.org/abs/2201.08239

40 of 104

6. Software ecosystem

41 of 104

  • 📃 LLMs and Prompts
  • 🔗 Chains
  • 📚 Data Augmented Generation
  • 🤖 Agents
  • 🧠 Memory
  • 🧐 Evaluation

42 of 104

Vector Databases

  • xxx

43 of 104

LLM Programs

  • LLMs are embedded into a program or algorithm.
  • The LLM is NOT responsible for maintaining the current state of the program (i.e. its context)
  • The LLM, for each step of the program is only presented with a step-specific prompt and context.

“Large Language Model Programs”, https://arxiv.org/abs/2305.05364

44 of 104

Tree-of-Thought

“Large Language Model Guided Tree-of-Thought”, https://arxiv.org/abs/2305.08291

45 of 104

Graph of Thoughts

“Graph of Thoughts: Solving Elaborate Problems with Large Language Models”, https://arxiv.org/abs/2308.09687

46 of 104

Auto-GPT

  • Experimental open-source application showcasing the capabilities of the GPT-4 language model.
  • An "AI agent" that, given a goal in natural language, will attempt to achieve it by breaking it into sub-tasks.
  • Can use tools like internet search, memory, file storage & summarization, a system of plugins.
  • Twice more “Stars” on GitHub (148k) than PyTorch has (70.4k)

47 of 104

7. Code

48 of 104

Code LLaMA

“Introducing Code Llama, an AI Tool for Coding”

https://about.fb.com/news/2023/08/code-llama-ai-for-coding/

49 of 104

WizardCoder

50 of 104

Other LLMs for code

  • Copilot (OpenAI Codex)
  • StarCoder
  • StableCode
  • Pangu-Coder2
  • GPT-4, PaLM 2, and other general LLMs can also be useful

51 of 104

8. Context on Multilinguality

52 of 104

Multilinguality

Many models are now multilingual by default

  • GPT-3+ (can generate on many languages & translate)
  • BLOOM (46 human + 13 programming langs)
  • NLLB-200 (200 langs)
  • PaLI (109 langs)

Sometimes that happen without any special efforts:

“OPT was not intentionally trained to be multilingual, but we found anecdotally it has limited success with simple translations in German, Spanish, French, and Chinese”

53 of 104

Curse of Multilinguality

“The experiments expose a trade-off as we scale the number of languages for a fixed model capacity: more languages leads to better cross-lingual performance on low-resource languages up until a point, after which the overall performance on monolingual and cross-lingual benchmarks degrades. We refer to this tradeoff as the curse of multilinguality, and show that it can be alleviated by simply increasing model capacity.”

“For a fixed sized model, the per-language capacity decreases as we increase the number of languages. While low-resource language performance can be improved by adding similar higher-resource languages during pretraining, the overall downstream performance suffers from this capacity dilution. Positive transfer and capacity dilution have to be traded off against each other.”

“Unsupervised Cross-lingual Representation Learning at Scale”https://aclanthology.org/2020.acl-main.747/

54 of 104

[Meta] WMT 2021

For the first time, a single multilingual model has outperformed the best specially trained bilingual models across 10 out of 14 language pairs to win WMT (news translation task).

55 of 104

56 of 104

[Meta] NLLB-200

57 of 104

[Google] 1,000 Languages Initiative

“That’s why today we’re announcing the 1,000 Languages Initiative, an ambitious commitment to build an AI model that will support the 1,000 most spoken languages, bringing greater inclusion to billions of people in marginalized communities all around the world.

And our most advanced language models are multimodal – meaning they’re capable of unlocking information across these many different formats. With these seismic shifts come new opportunities.

As part of our this initiative and our focus on multimodality, we’ve developed a Universal Speech Model — or USM — that’s trained on over 400 languages, making it the largest language coverage seen in a speech model to date.”

58 of 104

8a. Machine Translation (MT)

59 of 104

60 of 104

61 of 104

62 of 104

> MT is disrupted by LLMs now

63 of 104

Prompting MT (our prediction in 2021)

64 of 104

65 of 104

> Some MT systems might be replaced by LLMs soon

66 of 104

67 of 104

68 of 104

> Transcreation replaces Translation

69 of 104

70 of 104

71 of 104

> But MT and LLMs may still need each other

72 of 104

> MT helps LLM

73 of 104

Datasets are still dominated by English

English is still the major language in many datasets used to train LLMs.

So, not surprisingly, models solve tasks using English better that with other languages, especially low-resource languages.

74 of 104

LLMs may work better with translation

“Do Multilingual Language Models Think Better in English?“ https://arxiv.org/abs/2308.01223

75 of 104

All languages are NOT (tokenized) equal

Because of tokenization issues, the same text in English may require less tokens than, say, for Korean. So, the less price and the longer context!

76 of 104

English has the shortest median token len

77 of 104

> LLM helps MT

78 of 104

Improving MT results with LLMs

There is a huge area of Source Quality Improvement and Automated Post-editing to improve MT results:

  • Fixing grammar
  • Rewriting for MT
  • Setting correct gender
  • Using proper tone of voice
  • Applying specific terminology
  • Style correction according to guidelines

79 of 104

9. Multimodality is a norm

80 of 104

GPT-4 with visual inputs

GPT-4 was trained in a multimodal setting (text+image) and is able to take images on its input

Though, this feature is not available in the API yet.

81 of 104

PaLM-E (Google)

“PaLM-E: An Embodied Multimodal Language Model”

https://arxiv.org/abs/2303.03378

82 of 104

Kosmos-1 (& 2): a multimodal LLM (MLLM)

“Language Is Not All You Need: Aligning Perception with Language Models”

https://arxiv.org/abs/2302.14045

83 of 104

OSS Flamingo: OpenFlamingo (9B)

84 of 104

OSS Flamingo: IDEFICS (9B, 80B)

85 of 104

[Google] Universal Speech Model (USM)

  • 2B parameters
  • Trained on 12 million hours of speech and 28 billion sentences of text
  • 300+ languages

86 of 104

[Meta] SeamlessM4T

  • ASR for ~100 languages
  • S2TT for ~100 input and �output languages
  • S2ST for ~100 input and 35 �(+ English) output languages
  • T2TT for ~100 languages
  • T2ST for ~100 input and 35 �(+ English) output languages

87 of 104

Risks: Voice cloning

88 of 104

Dynalang (DreamerV3 + LLM)

“Learning to Model the World with Language”, https://arxiv.org/abs/2308.01399

89 of 104

DeepMind Gato 2 is on the horizon?

“A Generalist Agent”, https://arxiv.org/abs/2205.06175

90 of 104

Image Generation

  • Stable Diffusion
  • MidJourney
  • DALLE 2
  • Adobe products

They’re also multimodal models:

  • Inputs: Text + Image�(both optional)
  • Output: Image

91 of 104

[Runway] Generating video: Gen1 & Gen2

Modes:

  • Text to Video
  • Text + Image to Video
  • Image to Video
  • Stylization
  • Storyboard →
  • Mask
  • Render
  • Customization

92 of 104

10. Safety, Legal & Ethical issues

93 of 104

Misuse & Malicious use

94 of 104

Legal risks

  • Rights to use different data to train models
    • does it violate licences or not?
    • who owns the result?
    • can you copyright the result?
    • what’s with plagiarism?
  • What if AI
    • generates libel or slander?
    • provides false or dangerously wrong info?
    • expose private info?

95 of 104

Ethical problems

96 of 104

NPU Uprising: American Military's AI Cluster Goes Rogue, Authorities Scramble to Regain Control

97 of 104

> Push towards transparency & responsibility.�But different approaches.

98 of 104

“Pause Giant AI Experiments: An Open Letter”

“Contemporary AI systems are now becoming human-competitive at general tasks, and we must ask ourselves: Should we let machines flood our information channels with propaganda and untruth? Should we automate away all the jobs, including the fulfilling ones? Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization? Such decisions must not be delegated to unelected tech leaders. Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable.“ … “Therefore, we call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4.

99 of 104

“Time is running out: Demand responsible AI development!”

“Therefore, we call for:

  1. The creation of clear ethical guidelines for AI development that promote respect for human rights, citizen privacy and social justice.
  2. The creation of a binding, national and international regulatory framework for AI developers and companies that ensures AI systems are transparent, fair and non-discriminatory.
  3. The promotion of research and collaboration on AI safety, fairness and accountability to minimise potential negative impacts of AI systems.”

100 of 104

“Keep up the progress tempo”

Join us in our urgent mission to democratize AI research by establishing an international, publicly funded supercomputing facility equipped with 100,000 state-of-the-art AI accelerators to train open source foundation models.

“the open-source nature of this project will promote safety and security research, allowing potential risks to be identified and addressed more rapidly and transparently by the academic community and open-source enthusiasts. This is a vital step in ensuring the safety and reliability of AI technologies as they become increasingly integrated into our lives.”

101 of 104

“Include Consciousness Research”

To understand whether AI systems are, or can become, conscious, tools are needed that can be applied to artificial systems. In particular, science needs to further develop formal and mathematical tools to model consciousness and its relationship to physical systems. In conjunction with empirical and experimental methods to measure consciousness, questions of AI consciousness must be tackled.

102 of 104

More papers and discussions �in our Telegram channel:�https://t.me/gonzo_ML

103 of 104

BTW, the book is almost ready!

104 of 104

Thanks!