1 of 10

Empowering the Future with Multilinguality and Language Diversity

1

En-Shiun Annie Lee*^†, Kosei Uemura^†, Syed Mekael Wasti*, Mason Shipton*

*Ontario Tech University, Canada ^†University of Toronto, Canada

Assignments can be found in https://github.com/Kosei1227/OTU-LLM-Course

2 of 10

Presentation Contents

Introduction to the Course and Learning Outcomes
Target Audience
Course Contents
Assignments

Assignment 1: A Journey Through Language Modelling
Assignment 2: Neural Machine Translation with Custom Vocabulary Building & Transformer
Assignment 3: Adapting Languages with Fine-Tuning

2

3 of 10

Introduction to the Course and Learning Outcomes

3

CSCI 4055 Natural Language Processing - computer science course�
Teaches state-of-the-art Natural Language Processing (NLP) techniques�
Learning outcomes of the course include:

Deep understanding of NLP concepts & methods
Ability to modify & debug NLP code proficiently
Application of advanced NLP techniques in Large Language Models (LLMs)
Personal connections to multilinguality and language diversity projects

4 of 10

Target Audience

The course is aimed at the following OTU computer science students:

Upper-year undergraduates
Graduate students - last year every student in the class spoke a second non-official language

High proportion of first-generation immigrants at OTU

The focus on multilingualism and language diversity empowers local population

4

Figure 1: Map of non-official languages in OTU region

(Size of box is proportional to percentage of non-official languages)

5 of 10

Course Contents

Three weeks for invited speakers working in multilinguality and language diversity
Weekly laboratories will require students to:

Work through NLP tasks in Jupyter notebooks
Answer quiz questions relevant to the task’s code

5

Table 1: Contents of the weekly lectures and corresponding lab notebooks.

12 Week Course
3 Assignments, 1 midterm and a final exam?
Labs will complement the lectures and cover modern NLP techniques such as:

Regex
N-Gram based language modelling
PyTorch & Attention
Word/Vector Embeddings
RNNs
Working with HuggingFace

��1:21

our course is 12 weeks long and has weekly Labs three

1:24

assignments a midterm and a final exam

1:27

the labs will complement the lectures

1:29

and will cover cover modern NLP

1:31

techniques such as RX engr based

1:34

language modeling P torch word and

1:36

vector embeddings and working with

1:39

hugging face there are three aspects to

1:42

the assignments of the course the first

1:44

aspect is that the assignments are

1:46

designed to be practical and Hands-On

1:49

allowing students to apply theoretical

1:50

knowledge to real world problems

1:53

enhancing their problem solving skills

1:55

and critical thinking the second is that

1:58

each assignment incorporates mult

2:00

linguality leveraging the diverse

2:02

linguistic backgrounds of our students

2:05

making the learning process more

2:07

engaging and relevant to personal

2:09

experiences the third and final is that

2:12

the assignments encourage collaboration

2:14

and personal connections fostering a

2:17

supportive Community where students can

2:19

learn from each other's diverse

2:21

perspectives and

2:22

experiences

6 of 10

3 Assignments Aimed at Multilinguality

Assignment 1: A Journey through Language Modelling

Assignment 2: Neural Machine Translation with Custom Vocabulary & Transformer

Assignment 3: Adapting Languages with Fine-Tuning

6

Complement lecture materials & emphasize self-directed learning with projects
Focus on multilinguality and leveraging personal experiences
Fostering of community and collaboration

Assignments can be found in https://github.com/Kosei1227/OTU-LLM-Course

Project-Based Learning: The assignments are designed to be practical and hands-on, allowing students to apply theoretical knowledge to real-world problems, enhancing their problem-solving skills and critical thinking.

Multilingual Focus: Each assignment incorporates multilinguality, leveraging the diverse linguistic backgrounds of our students, making the learning process more engaging and relevant to personal experiences.

Community and Collaboration: The assignments encourage collaboration and personal connections, fostering a supportive community where students can learn from each other’s diverse perspectives and experiences.

OTU has the perfect demography for such an environment given its own and Canada’s rich diversity. Because of this, we have the perfect perspective to introduce NLP rather than focusing on solely English NLP tasks

7 of 10

Assignment 1: A Journey through Language Modelling

Assignment 1 introduces language modeling for low-resource languages

Students will:

Process datasets and build vocabularies (Schmidt et al., 2024)
Implement three models: statistical n-gram, neural n-gram, and Transformer (Gaddy et al., 2021; Vig & Belinkov, 2019)

Based on UC Berkeley's CS 288, Project 1

7

Source: (Botpenguin, 2024)

Assignment 1 introduces the foundations of language modeling for low-resource languages. Students will: Load and process datasets (Schmidt et al., 2024) Tokenize and construct custom vocabularies (Schmidt. et al., 2024) They will implement three types of language models, increasing in complexity (Gaddy et al., 2021): Basic statistical n-gram model (Brown et al., 1992) Feed-forward neural n-gram model (Vig and Belinkov, 2019) Transformer language model (Vig and Belinkov, 2019)

Students will conduct an open-ended exploration to improve their results. Open-ended exploration will be done using provided ideas or their intuition

the first of the three

2:24

assignments is entitled a journey

2:26

through language modeling it introduces

2:29

the foundations of language modeling

2:31

for low resource languages in the

2:33

assignment students will load and

2:36

process data sets tokenize and construct

2:39

custom

2:40

vocabularies they will Implement three

2:42

types of language models increasing

2:45

complexity the first is the basic

2:47

statistical engram model followed by the

2:50

feedforward neural engram model finally

2:54

the Transformer language

2:56

model in the end students will conduct

2:59

an open-ended exploration to improve

3:01

their results using either provided

3:03

ideas or their

3:05

intuition

8 of 10

Assignment 2: Neural Machine Translation with �Custom Vocabulary Building & Transformer

Covers neural machine translation (NMT) through custom vocabulary & transformer implementation (Vaswani et al., 2023)
Students will:

Implement a custom transformer architecture
Use core training concepts:

Gradient descent (Robbins, 1951)
Backpropagation (Rumelhart et al., 1986)

Evaluate performance with BLEU score (Papineni et al., 2002)

This enhances their NMT and machine learning skills

8

Source: (Facebook Engineering, 2018)

Assignment 2 covers neural machine translation (NMT) principles by building a custom transformer model (Vaswani et al., 2023) using PyTorch (Radford et al., 2023). Students will: Implement a transformer architecture with PyTorch modules Integrate forward and masking methods Create a custom training loop Work with gradient descent (Robbins, 1951), backpropagation (Rumelhart et al., 1986), and loss functions (LeCun et al., 2015)] Evaluate their model using metrics (BLEU score (Papineni et al., 2002)) Students will gain a comprehensive understanding of NMT and enhance their machine learning skills.

assignment two this assignment

3:08

builds upon lecture material and

3:09

hands-on experience from the first

3:11

assignment guiding students through the

3:13

process of creating their own neural

3:14

machine translation system from scratch

3:16

it introduces elements of

3:17

multilingualism and interactivity

3:20

allowing students to design custom

3:21

vocabularies based on the languages that

3:23

they know or wish to explore Beyond just

3:25

English although example corpora will be

3:28

provided students are more than welcome

3:30

and encourage to use corpora in

3:31

languages that they are familiar with or

3:33

speak the project offers students a

3:35

chance to deepen their foundational NLP

3:37

skills by implementing state-of-the-art

3:39

techniques this includes developing a

3:41

custom Transformer model and a training

3:43

Loop tailored to the data set and

3:44

vocabulary that they've constructed

3:46

through this students will gain

3:48

practical experience with essential

3:49

machine learning Concepts such as

3:51

gradient descent and back propagation

3:54

additionally students will evaluate

3:55

their models performance using industry

3:57

standard metrics like the blue score if

3:59

the results are not satisfactory they

4:01

will have the opportunity to revisit and

4:03

refine their approach this iterative

4:05

process ensures that students are

4:07

engaged in a meaningful task with a

4:09

clear understanding of its real world

4:10

relevance while simultaneously enhancing

4:13

their NLP and machine learning expertise

4:16

9 of 10

Assignment 3: Adapting Languages with Fine-Tuning

Focuses on adapting models for low-resource languages
Students will:

Use NMT datasets and select fine-tuning methods: LoRA (Hu et al., 2021) or prompt tuning (Lester et al., 2021)
Justify their strategy, fine-tune models, and compare with baselines

This enhances neural architecture knowledge and NLP research skills

9

Source: (Towards Data Science, 2024)

Source: (Data Science Stack Exchange, 2024)

Assignment 3 focuses on adapting existing language models to low-resource languages.

Students will: Select a low-resource language and use existing NMT datasets Explore fine-tuning methods (full parameter fine-tuning, LoRA (Hu et al., 2021), prompt tuning (Lester et al., 2021)) Discuss the rationale behind their chosen strategy (efficiency, effectiveness) Fine-tune models and develop custom benchmarks for performance evaluation Compare adapted models with baseline models

Assignment 3 builds on knowledge of neural architectures and enhances skills for real-world NLP research and applications.

assign three is adopting languages with

4:20

fine tuning that main focus if to adapt

4:25

models for the r languages students will

4:29

use newer machine transation data sets

4:33

and select find tune methods such as rur

4:38

and prompt tuning they also have to

4:41

justify the strategies find two models

4:46

and compare the resulting model with

4:49

baselines this assignment enhances neuro

4:53

arure knowledge and en research and

4:57

professional skills

10 of 10

References

10

Botpenguin Glossary. (2024). N-gram. Retrieved from https://botpenguin.com/glossary/n-gram.

Brown, P. F., Della Pietra, V. J., Desouza, P. V., Lai, J. C., & Mercer, R. L. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18(4), 467-480.

Data Science Stack Exchange. (2024). Meaning of fine-tuning in NLP task. Retrieved from https://datascience.stackexchange.com/questions/52719/meaning-of-fine-tuning-in-nlp-task.

Facebook Engineering. (2018). Under the hood: Multilingual embeddings. Retrieved from https://engineering.fb.com/2018/01/24/ml-applications/under-the-hood-multilingual-embeddings/.

Gaddy, D., Fried, D., Kitaev, N., Stern, M., Corona, R., DeNero, J., & Klein, D. (2021). Interactive assignments for teaching structured neural NLP. In Proceedings of the Fifth Workshop on Teaching NLP (pp. 104-107).

Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345-420.

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-rank adaptation of large language models. Preprint, arXiv:2106.09685.

Kaplan, J., McCandlish, S., Henighan, T. B., Brown, B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. Preprint, arXiv:2001.08361.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436-444.

Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. Preprint, arXiv:2104.08691.

Papineni, K., Roukos, S., Ward, T., & Zhu, W. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311-318). Philadelphia, PA: Association for Computational Linguistics.

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning (pp. 28492-28518). PMLR.

Robbins, H. E. (1951). A stochastic approximation method. Annals of Mathematical Statistics, 22, 400-407.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-536.

Santra, P., Ghosh, M., Mukherjee, S., Ganguly, D., Basuchowdhuri, P., & Naskar, S. K. (2023). Unleashing the power of large language models: A hands-on tutorial. In Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation (pp. 149-152).

Schmidt, C. W., Reddy, V., Zhang, H., Alameddine, A., Uzan, O., Pinter, Y., & Tanner, C. (2024). Tokenization is more than compression. Preprint, arXiv:2402.18376.

Towards Data Science. (2024). Understanding LoRA: Low-Rank Adaptation for Finetuning Large Models. Retrieved from https://towardsdatascience.com/understanding-lora-low-rank-adaptation-for-finetuning-large-models-936bce1a07c6.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). Attention is all you need. Preprint, arXiv:1706.03762.

Vig, J., & Belinkov, Y. (2019). Analyzing the structure of attention in a transformer language model. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (p. 63).