1 of 8

Advancing Language Identification in Code-Mixed Tulu Texts: Harnessing Deep Learning Techniques

Supriya Chandaa, Anshika Mishrab and Sukomal Pala  

aIndian Institute of Technology (BHU), Varanasi, INDIA

bVellore Institute of Technology Bhopal, Madhya Pradesh , INDIA

2 of 8

Agenda

Introduction

Dataset

Methodology

Result

Conclusion

3 of 8

Word-level Language Identification in Code-mixed Tulu Texts

This study focuses on the task of word-level language identification in code-mixed Tulu-English texts, which is crucial for addressing the linguistic diversity observed on social media platforms. Tulu, a regional language, coexists with Kannada and English, especially in social media discourse among Tulu-speaking individuals. The fusion of these languages in roman script has generated a unique and largely unexplored dataset.

4 of 8

Dataset

Description of labels in CoLI-Tunglish dataset

Class wise distribution of Train and Development dataset

5 of 8

Methodology

Hyperparameters

Learning rate: 0.01

Batch size: 16

Training epochs: 10

6 of 8

Result

Evaluation results on test data and rank list

7 of 8

Conclusion

In conclusion, the CoLI-Tunglish shared task addressed the intricate challenge of word-level language identification in code-mixed Tulu-English texts. The shared task provided valuable insights into the state of the art in code-mixed language identification and encouraged further research in this evolving field. It underlines the need for advanced NLP techniques to bridge the gap between linguistic diversity in digital communication and automated language processing. Future endeavors in this domain will likely yield more robust solutions for handling code-mixed text, enabling more accurate language understanding and information retrieval in multilingual contexts.

8 of 8

Thank You