1 of 14

Identifying the Type of Sarcasm

in Dravidian Languages

using Deep-Learning Models

FIRE CONFERENCE 2023

BY

RAMYA SIVAKUMAR

C JERIN MAHIBHA

B MONICA JENEFER

2 of 14

OBJECTIVES

    • Understanding sarcasm is crucial for accurately interpreting cultural nuances. Dravidian languages, with their rich cultural diversity, often incorporate sarcasm as a form of expression.
    • Integrating sarcasm detection into sentiment analysis models for Dravidian languages can lead to more accurate assessments of public opinion and sentiment on various topics.
    • This has implications for developing chatbots, virtual assistants, and other AI applications that need to understand and respond appropriately to user inputs in Dravidian languages.

Motivation

3 of 14

    • Identify Offensive language from Social Media Comments
    • To train a deep-learning model using the training dataset
    • Improve its working with the development dataset
    • Predict the label for the test comments data

Objective

4 of 14

INTRODUCTION

    • SARCASM COMES FROM THE GREEK WORD "SARK'AZEIN," MEANING TO SPEAK BITTERLY. IT'S USED HUMOROUSLY TO MOCK PEOPLE AND IS EASY TO SPOT IN FACE-TO-FACE TALKS WITH FACIAL EXPRESSIONS AND TONE.

    • DETECTING SARCASM IN WRITTEN MESSAGES, ESPECIALLY ON PLATFORMS LIKE YOUTUBE, IS HARD. THE FREEDOM TO COMMENT IN ANY WAY MAKES IT TOUGH WITHOUT VISUAL OR AUDITORY CUES.

    • IT'S HARDER TO CATCH SARCASM IN WRITING. FOR EXAMPLE, A MESSAGE LIKE "THANK YOU FOR YOUR HELP" WITH A SMILEY FACE IS DIFFERENT FROM THE SAME MESSAGE WITH AN ANGRY LOOK.

    • SARCASM DETECTION IS VITAL FOR BETTER COMMUNICATION, AVOIDING MISUNDERSTANDINGS, AND ANALYZING SENTIMENTS IN PRODUCT REVIEWS. IT ALSO HELPS WITH CYBERSECURITY, WELLNESS, AND UNDERSTANDING SOCIAL MEDIA IMPACTS.

    • IMPACT ON AI DEVELOPMENT: RECOGNIZING SARCASM HELPS AI UNDERSTAND LANGUAGE NUANCES, IMPROVING ITS OVERALL PERFORMANCE BY PROVIDING MORE RELEVANT CONTENT IN DIFFERENT CONTEXTS

5 of 14

RELATED WORKS

S NO

PAPER

YEAR

METHODOLOGY

1

Offensive language identification using machine

learning and deep learning techniques

2021

bidirectional dual encoder with Additive Margin Softmax

2

Multilingual sentiment analysis in tamil malayalam and

kannada code-mixed social media posts using mbert., in: FIRE (Working Notes)

2021

pre-defined BERT model with the ktrain library

3

Sarcasm detection over social

media platforms using hybrid auto-encoder-based model

2022

hybrid model that comprises of BERT, USE, and Autoencoder

4

A machine learning approach in analysing the effect of

hyperboles using negative sentiment tweets for sarcasm detection

2022

created and implemented a data model called the hyperbole-based Sarcasm detection

model (HbSD)

5

Probabilistic model based context augmented deep

learning approach for sarcasm detection in social media

2020

created and implemeted a probabilistic model that works with help CNN

6 of 14

Summary of Related Works

    • Lack of Annotated Dataset for Dravidian Languages
    • Consideration of Context-Based Features
    • Presence of Code-Mixed Data
    • Sarcasm Handling Strategies
    • Morphological Complexities of Dravidian Languages
    • Performance Variability with the Same Classifier and Dataset

Open Challenges

Methodology

    • Machine learning Models
    • Deep Learning Models

    • Dravidian languages lack in standardized annotated dataset

Dataset

7 of 14

    • Shared task provided by FIRE2023
    • The task contains comments extracted from various social media platforms
    • The comments are categorized and distinguised using two different labels: Sarcastic and Non-Sarcastic
    • There were two sets of training data : Tamil and Malayalam
    • Similarly, Development and Test Data was also provided for each language

DATASET DESCRIPTION

8 of 14

PROPOSED METHODOLOGY

    • The considered Deep Learning model is trained using the training dataset
    • The data is pre-processed and encoded
    • Validation Dataset is used for fine-tuning
    • Using the model, the labels are predicted for the Test Dataset

Process

    • Training dataset was provided for Tamil and Malayalam comments
    • Similarly, Development and Test dataset was also provided separately for the two languages

Dataset Usage

Results

secure Rank 4 in

Tamil and Rank 7

in Malayalam

9 of 14

METHODOLOGY FLOW

10 of 14

ERROR ANALYSIS

S NO

TEXT

PREDICTED LABEL

ACTUAL LABEL

1

Thala Thala tha tamil in beggast flim

Non-Sarcastic

Sarcastic

2

Oruthar mela neenga viswasam kata..Inoruthara neenga asinga paduthuringa.... Deep

Sarcastic

Non-Sarcastic

Tamil Dataset

S NO

TEXT

PREDICTED LABEL

ACTUAL LABEL

1

Atom bomb, tsunami, volcano eruption oke Non-sarcastic Sarcastic

athi jeevichadallejapan.Last dialogue seriyayilla

Non-Sarcastic

Sarcastic

2

Kottayam to paala daily kettuuu vattanu e songgg my addicted song

Non-Sarcastic

Sarcastic

Malayalam Dataset

11 of 14

CONCLUSION

    • Online communication is getting more complex, making traditional detection methods ineffective for sarcasm.

    • It's vital to tell the difference between sarcastic and non-sarcastic text online. Relying solely on language, sentiment, and syntax can lead to misunderstandings.

    • Sarcasm detection requires a focus on context and understanding meaning, going beyond linguistic features for more accurate results.

    • To improve sarcasm detection, using bigger datasets for training models is key, aiming for better accuracy and effectiveness.

    • Emojis and emoticons are crucial in conveying meaning on social media. Future sarcasm detection efforts may involve considering these visual elements in addition to text.

12 of 14

References

[1]A. REYES, P. ROSSO, T. VEALE, A MULTIDIMENSIONAL APPROACH FOR DETECTING IRONY IN TWITTER,

LANGUAGE RESOURCES AND EVALUATION 47 (2013) 239–268.

[2] M. BIRJALI, M. KASRI, A. BENI-HSSANE, A COMPREHENSIVE SURVEY ON SENTIMENT ANALYSIS:

APPROACHES, CHALLENGES AND TRENDS, KNOWLEDGE-BASED SYSTEMS 226 (2021) 107134.

[3] C. J. MAHIBHA, S. KAYALVIZHI, D. THENMOZHI, SENTIMENT ANALYSIS USING CROSS LINGUAL WORD

EMBEDDING MODEL (2021).

[4] D. K. SHARMA, B. SINGH, S. AGARWAL, H. KIM, R. SHARMA, SARCASM DETECTION OVER SOCIAL

MEDIA PLATFORMS USING HYBRID AUTO-ENCODER-BASED MODEL, ELECTRONICS 11 (2022) 2844.

[5] A. B. MERIEM, L. HLAOUA, L. B. ROMDHANE, A FUZZY APPROACH FOR SARCASM DETECTION IN SOCIAL

NETWORKS, PROCEDIA COMPUTER SCIENCE 192 (2021) 602–611.

[6] K. SUNDARARAJAN, A. PALANISAMY, PROBABILISTIC MODEL BASED CONTEXT AUGMENTED DEEP

LEARNING APPROACH FOR SARCASM DETECTION IN SOCIAL MEDIA, INT. J. ADV. SCI. TECHNOL 29 (2020)

8461–79.

[7] D. VINOTH, P. PRABHAVATHY, AN INTELLIGENT MACHINE LEARNING-BASED SARCASM DETECTION AND CLASSIFICATION MODEL ON SOCIAL NETWORKS, THE JOURNAL OF SUPERCOMPUTING 78 (2022) 10575–10594. [8] V. GOVINDAN, V. BALAKRISHNAN, A MACHINE LEARNING APPROACH IN ANALYSING THE EFFECT OF HYPERBOLES USING NEGATIVE SENTIMENT TWEETS FOR SARCASM DETECTION, JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES 34 (2022) 5110–5120.

13 of 14

References

[9] A. KALAIVANI, D. THENMOZHI, MULTILINGUAL SENTIMENT ANALYSIS IN TAMIL MALAYALAM AND KANNADA CODE-MIXED SOCIAL MEDIA POSTS USING MBERT., IN: FIRE (WORKING NOTES), 2021, PP.

1020–1028.

[10] S. BELLAMKONDA, M. LOHAKARE, S. PATEL, A DATASET FOR DETECTING HUMOR IN TELUGU SOCIAL MEDIA

TEXT, IN: PROCEEDINGS OF THE SECOND WORKSHOP ON SPEECH AND LANGUAGE TECHNOLOGIES FOR

DRAVIDIAN LANGUAGES, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, DUBLIN, IRELAND, 2022, PP.

9–14. URL: HTTPS://ACLANTHOLOGY.ORG/2022.DRAVIDIANLANGTECH-1.2. DOI:10.18653/V1/2022.

DRAVIDIANLANGTECH- 1.2.

[11] S. K. LORA, G. SHAHARIAR, T. NAZMIN, N. N. RAHMAN, R. RAHMAN, M. BHUIYAN, ET AL., BEN-SARC:

A CORPUS FOR SARCASM DETECTION FROM BENGALI SOCIAL MEDIA COMMENTS AND ITS BASELINE

EVALUATION (2022).

[12] J. MAHIBHA, S. KAYALVIZHI, D. THENMOZHI, OFFENSIVE LANGUAGE IDENTIFICATION USING MACHINE

LEARNING AND DEEP LEARNING TECHNIQUES (2021).

[13] B. R. CHAKRAVARTHI, N. SRIPRIYA, B. BHARATHI, K. NANDHINI, S. CHINNAUDAYAR NAVANEETHAKRISHNAN,

T. DURAIRAJ, R. PONNUSAMY, P. K. KUMARESAN, K. K. PONNUSAMY, C. RAJKUMAR,

OVERVIEW OF THE SHARED TASK ON SARCASM IDENTIFICATION OF DRAVIDIAN LANGUAGES (MALAYALAM

AND TAMIL) IN DRAVIDIANCODEMIX, IN: FORUM OF INFORMATION RETRIEVAL AND EVALUATION FIRE

- 2023, 2023.

[14] Z. LAN, M. CHEN, S. GOODMAN, K. GIMPEL, P. SHARMA, R. SORICUT, ALBERT: A LITE BERT FOR

SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, ARXIV PREPRINT ARXIV:1909.11942

(2019).

14 of 14

THANK YOU