1 of 10

Empowering the Future with Multilinguality and Language Diversity

1

En-Shiun Annie Lee*, Kosei Uemura, Syed Mekael Wasti*, Mason Shipton*

*Ontario Tech University, Canada University of Toronto, Canada

Assignments can be found in https://github.com/Kosei1227/OTU-LLM-Course

2 of 10

Presentation Contents

  • Introduction to the Course and Learning Outcomes
  • Target Audience
  • Course Contents
  • Assignments
    • Assignment 1: A Journey Through Language Modelling
    • Assignment 2: Neural Machine Translation with Custom Vocabulary Building & Transformer
    • Assignment 3: Adapting Languages with Fine-Tuning

2

3 of 10

Introduction to the Course and Learning Outcomes

3

  • CSCI 4055 Natural Language Processing - computer science course�
  • Teaches state-of-the-art Natural Language Processing (NLP) techniques�
  • Learning outcomes of the course include:
    • Deep understanding of NLP concepts & methods
    • Ability to modify & debug NLP code proficiently
    • Application of advanced NLP techniques in Large Language Models (LLMs)
    • Personal connections to multilinguality and language diversity projects

4 of 10

Target Audience

  • The course is aimed at the following OTU computer science students:
    • Upper-year undergraduates
    • Graduate students - last year every student in the class spoke a second non-official language

  • High proportion of first-generation immigrants at OTU
    • The focus on multilingualism and language diversity empowers local population

4

Figure 1: Map of non-official languages in OTU region

(Size of box is proportional to percentage of non-official languages)

5 of 10

Course Contents

  • Three weeks for invited speakers working in multilinguality and language diversity
  • Weekly laboratories will require students to:
    • Work through NLP tasks in Jupyter notebooks
    • Answer quiz questions relevant to the task’s code

5

Table 1: Contents of the weekly lectures and corresponding lab notebooks.

6 of 10

3 Assignments Aimed at Multilinguality

Assignment 1: A Journey through Language Modelling

Assignment 2: Neural Machine Translation with Custom Vocabulary & Transformer

Assignment 3: Adapting Languages with Fine-Tuning

6

  • Complement lecture materials & emphasize self-directed learning with projects
  • Focus on multilinguality and leveraging personal experiences
  • Fostering of community and collaboration

Assignments can be found in https://github.com/Kosei1227/OTU-LLM-Course

7 of 10

Assignment 1: A Journey through Language Modelling

  • Assignment 1 introduces language modeling for low-resource languages

  • Students will:
    • Process datasets and build vocabularies (Schmidt et al., 2024)
    • Implement three models: statistical n-gram, neural n-gram, and Transformer (Gaddy et al., 2021; Vig & Belinkov, 2019)

  • Based on UC Berkeley's CS 288, Project 1

7

Source: (Botpenguin, 2024)

8 of 10

Assignment 2: Neural Machine Translation with �Custom Vocabulary Building & Transformer

  • Covers neural machine translation (NMT) through custom vocabulary & transformer implementation (Vaswani et al., 2023)
  • Students will:
    • Implement a custom transformer architecture
    • Use core training concepts:
      • Gradient descent (Robbins, 1951)
      • Backpropagation (Rumelhart et al., 1986)
    • Evaluate performance with BLEU score (Papineni et al., 2002)

  • This enhances their NMT and machine learning skills

8

Source: (Facebook Engineering, 2018)

9 of 10

Assignment 3: Adapting Languages with Fine-Tuning

  • Focuses on adapting models for low-resource languages
  • Students will:
    • Use NMT datasets and select fine-tuning methods: LoRA (Hu et al., 2021) or prompt tuning (Lester et al., 2021)
    • Justify their strategy, fine-tune models, and compare with baselines

  • This enhances neural architecture knowledge and NLP research skills

9

Source: (Towards Data Science, 2024)

Source: (Data Science Stack Exchange, 2024)

10 of 10

References

10

Botpenguin Glossary. (2024). N-gram. Retrieved from https://botpenguin.com/glossary/n-gram.

Brown, P. F., Della Pietra, V. J., Desouza, P. V., Lai, J. C., & Mercer, R. L. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18(4), 467-480.

Data Science Stack Exchange. (2024). Meaning of fine-tuning in NLP task. Retrieved from https://datascience.stackexchange.com/questions/52719/meaning-of-fine-tuning-in-nlp-task.

Facebook Engineering. (2018). Under the hood: Multilingual embeddings. Retrieved from https://engineering.fb.com/2018/01/24/ml-applications/under-the-hood-multilingual-embeddings/.

Gaddy, D., Fried, D., Kitaev, N., Stern, M., Corona, R., DeNero, J., & Klein, D. (2021). Interactive assignments for teaching structured neural NLP. In Proceedings of the Fifth Workshop on Teaching NLP (pp. 104-107).

Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345-420.

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-rank adaptation of large language models. Preprint, arXiv:2106.09685.

Kaplan, J., McCandlish, S., Henighan, T. B., Brown, B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. Preprint, arXiv:2001.08361.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436-444.

Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. Preprint, arXiv:2104.08691.

Papineni, K., Roukos, S., Ward, T., & Zhu, W. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311-318). Philadelphia, PA: Association for Computational Linguistics.

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning (pp. 28492-28518). PMLR.

Robbins, H. E. (1951). A stochastic approximation method. Annals of Mathematical Statistics, 22, 400-407.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-536.

Santra, P., Ghosh, M., Mukherjee, S., Ganguly, D., Basuchowdhuri, P., & Naskar, S. K. (2023). Unleashing the power of large language models: A hands-on tutorial. In Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation (pp. 149-152).

Schmidt, C. W., Reddy, V., Zhang, H., Alameddine, A., Uzan, O., Pinter, Y., & Tanner, C. (2024). Tokenization is more than compression. Preprint, arXiv:2402.18376.

Towards Data Science. (2024). Understanding LoRA: Low-Rank Adaptation for Finetuning Large Models. Retrieved from https://towardsdatascience.com/understanding-lora-low-rank-adaptation-for-finetuning-large-models-936bce1a07c6.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). Attention is all you need. Preprint, arXiv:1706.03762.

Vig, J., & Belinkov, Y. (2019). Analyzing the structure of attention in a transformer language model. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (p. 63).