Empowering the Future with Multilinguality and Language Diversity
1
En-Shiun Annie Lee*†, Kosei Uemura†, Syed Mekael Wasti*, Mason Shipton*
*Ontario Tech University, Canada †University of Toronto, Canada
Assignments can be found in https://github.com/Kosei1227/OTU-LLM-Course
Presentation Contents
2
Introduction to the Course and Learning Outcomes
3
Target Audience
4
Figure 1: Map of non-official languages in OTU region
(Size of box is proportional to percentage of non-official languages)
Course Contents
5
Table 1: Contents of the weekly lectures and corresponding lab notebooks.
3 Assignments Aimed at Multilinguality
Assignment 1: A Journey through Language Modelling
Assignment 2: Neural Machine Translation with Custom Vocabulary & Transformer
Assignment 3: Adapting Languages with Fine-Tuning
6
Assignments can be found in https://github.com/Kosei1227/OTU-LLM-Course
Assignment 1: A Journey through Language Modelling
7
Source: (Botpenguin, 2024)
Assignment 2: Neural Machine Translation with �Custom Vocabulary Building & Transformer
8
Source: (Facebook Engineering, 2018)
Assignment 3: Adapting Languages with Fine-Tuning
9
Source: (Towards Data Science, 2024)
Source: (Data Science Stack Exchange, 2024)
References
10
Botpenguin Glossary. (2024). N-gram. Retrieved from https://botpenguin.com/glossary/n-gram.
Brown, P. F., Della Pietra, V. J., Desouza, P. V., Lai, J. C., & Mercer, R. L. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18(4), 467-480.
Data Science Stack Exchange. (2024). Meaning of fine-tuning in NLP task. Retrieved from https://datascience.stackexchange.com/questions/52719/meaning-of-fine-tuning-in-nlp-task.
Facebook Engineering. (2018). Under the hood: Multilingual embeddings. Retrieved from https://engineering.fb.com/2018/01/24/ml-applications/under-the-hood-multilingual-embeddings/.
Gaddy, D., Fried, D., Kitaev, N., Stern, M., Corona, R., DeNero, J., & Klein, D. (2021). Interactive assignments for teaching structured neural NLP. In Proceedings of the Fifth Workshop on Teaching NLP (pp. 104-107).
Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345-420.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-rank adaptation of large language models. Preprint, arXiv:2106.09685.
Kaplan, J., McCandlish, S., Henighan, T. B., Brown, B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. Preprint, arXiv:2001.08361.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436-444.
Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. Preprint, arXiv:2104.08691.
Papineni, K., Roukos, S., Ward, T., & Zhu, W. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311-318). Philadelphia, PA: Association for Computational Linguistics.
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning (pp. 28492-28518). PMLR.
Robbins, H. E. (1951). A stochastic approximation method. Annals of Mathematical Statistics, 22, 400-407.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-536.
Santra, P., Ghosh, M., Mukherjee, S., Ganguly, D., Basuchowdhuri, P., & Naskar, S. K. (2023). Unleashing the power of large language models: A hands-on tutorial. In Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation (pp. 149-152).
Schmidt, C. W., Reddy, V., Zhang, H., Alameddine, A., Uzan, O., Pinter, Y., & Tanner, C. (2024). Tokenization is more than compression. Preprint, arXiv:2402.18376.
Towards Data Science. (2024). Understanding LoRA: Low-Rank Adaptation for Finetuning Large Models. Retrieved from https://towardsdatascience.com/understanding-lora-low-rank-adaptation-for-finetuning-large-models-936bce1a07c6.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). Attention is all you need. Preprint, arXiv:1706.03762.
Vig, J., & Belinkov, Y. (2019). Analyzing the structure of attention in a transformer language model. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (p. 63).