1 of 14

LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS

Microsoft research

2 of 14

Problem Statement

- Fine tuning is difficult for large language models – Memory constraints

Contribution: Propose an adaptation technique for fine-tuning large language model

3 of 14

Approach

4 of 14

Results

Glue scores

5 of 14

Results

Accuracy

6 of 14

LONGLORA: EFFICIENT FINE-TUNING OF LONG CONTEXT LARGE LANGUAGE MODELS

- CUHK

- MIT

7 of 14

Problem Statement

- LLaMA, BERT, GPT – All trained with fixed context size

- Makes it less effective for long documents

- Training from scratch with long sequences is difficult (Attention heavy)

Intuition:

Although dense global attention is needed during inference, finetuning the model can be effectively and efficiently done by sparse local attention.

8 of 14

Contributions

  1. Short Attention mechanism : S2-Attn

  • Parameter-Efficient Fine-Tuning

9 of 14

Previous approaches

If we have LLM with 2K context length but seq length is 8K: Use multiple short attentions

Text =[1,2,3,…..7999,8000]

Group 1: [1,2,….,2000]

Group 2: [2001,2002,….,4000]

Group 3: [4001,4002,….,6000]

Group 4: [6001,6002,….,8000]

No communication between groups

10 of 14

Motivation

If we have LLM with 2K context length but seq length is 8K: Use multiple short attentions

Text =[1,2,3,…..7999,8000]

Group 1: [1001,1002,….,3000]

Group 2: [3001,3002,….,5000]

Group 3: [5001,5002,….,7000]

Group 4: [7001,7002,…8000,1,2,…..,1000

Some communication between groups

11 of 14

S2-Attn

Full attention of 1st group

Information flows through shifting of half-heads

12 of 14

LongLora

LongLora = = S2-Attn + LoRA

13 of 14

Advantages

  1. Preservation of original architecture

  • Easy implementation

14 of 14

Results