Advancing Language Identification in Code-Mixed Tulu Texts: Harnessing Deep Learning Techniques
Supriya Chandaa, Anshika Mishrab and Sukomal Pala
�aIndian Institute of Technology (BHU), Varanasi, INDIA
bVellore Institute of Technology Bhopal, Madhya Pradesh , INDIA
�
Agenda
Introduction
Dataset
Methodology
Result
Conclusion
Word-level Language Identification in Code-mixed Tulu Texts
This study focuses on the task of word-level language identification in code-mixed Tulu-English texts, which is crucial for addressing the linguistic diversity observed on social media platforms. Tulu, a regional language, coexists with Kannada and English, especially in social media discourse among Tulu-speaking individuals. The fusion of these languages in roman script has generated a unique and largely unexplored dataset.
Dataset
Description of labels in CoLI-Tunglish dataset
Class wise distribution of Train and Development dataset
Methodology
Hyperparameters
Learning rate: 0.01
Batch size: 16
Training epochs: 10
Result
Evaluation results on test data and rank list
Conclusion
In conclusion, the CoLI-Tunglish shared task addressed the intricate challenge of word-level language identification in code-mixed Tulu-English texts. The shared task provided valuable insights into the state of the art in code-mixed language identification and encouraged further research in this evolving field. It underlines the need for advanced NLP techniques to bridge the gap between linguistic diversity in digital communication and automated language processing. Future endeavors in this domain will likely yield more robust solutions for handling code-mixed text, enabling more accurate language understanding and information retrieval in multilingual contexts.
Thank You