1 of 22

Forum for Information Retrieval and Evaluation:

SARCASM DETECTION IN CODE-MIXED DRAVIDIAN LANGUAGES

Dhanya Krishnan

Krithika Dharanikota

Dr. B. Bharathi

2 of 22

TABLE OF CONTENTS

Results and metrics achieved

RESULTS

03

04

01

02

A brief look into sentiment analysis using ML

INTRODUCTION

Our inferences and scope for future studies

INFERENCE

The models and methods used in this study

METHODOLOGIES

3 of 22

INTRODUCTION

01

4 of 22

SENTIMENT ANALYSIS

  • Sentiment Analysis tools work to classify text based on the underlying emotion.
  • ML models are trained using specific datasets or rule-based lexicons.
  • Crucial for various applications - Business and market research, Politics, Social Media Monitoring.

5 of 22

CHALLENGES IN IDENTIFYING SARCASM

While sentiment analysis works well with directly expressed emotions, sarcasm detection is not easy:

  • Sarcasm relies heavily on context
  • Often involves tone and body-language
  • Data imbalance in datasets with sarcasm
  • Ambiguity is often difficult for machines to pick up on

6 of 22

DATASET STATS

SARCASM IN TAMIL

Tamil has 19866 non-sarcastic comments and 7170 sarcastic comments

26.5%

SARCASM IN MALAYALAM

Malayalam has 9798 non-sarcastic comments and 2259 sarcastic comments

18.6%

The imbalance in the number of sarcastic vs non sarcastic comments is highlighted in the graph.

7 of 22

METHODOLOGIES

02

8 of 22

9 of 22

OUR TOP PERFORMING MODELS

Countvectorizer with Multilayer Perceptron for Classifier

MALAYALAM

Tf-IDF vectorizer with Multilayer Perceptron for classification.

TAMIL

10 of 22

Why TF-IDF Vectorizer?

  • Assigns weights to words based on their frequency in a document and rarity across a corpus.
  • Measures the importance of terms in a specific document relative to their occurrence in the entire dataset.
  • Contextual Significance: Captures the significance of words and, emphasizing unique terms associated with sarcasm in given context.
  • Multilingual Consideration: Effectively handles code-mixed content to identify sarcasm in comments containing a mix of languages.

11 of 22

Why CountVectorizer?

  • Represents text by counting the frequency of each word in a document. This allows us to capture the occurrence of terms, providing a simple and straightforward representation of the document.
  • Context Significance: allows us to capture the frequency of words without explicitly considering their contextual relationships. In our task, where the language may vary widely and the context can be diverse, a context-agnostic approach is beneficial.
  • Multilingual Consideration: Countvectorizer can aptly represent the linguistic diversity in the dataset. This ensures that the model is capable of identifying sarcasm across various language expressions commonly found in YouTube comments.

12 of 22

Why Multilayer Perceptron?

  • Neural network architecture adept at capturing complex patterns in data and learning intricate relationships within feature-rich datasets.
  • Non-linear Pattern Recognition: MLP excels at capturing non-linear relationships and discern subtle linguistic nuances to identify sarcasm effectively.
  • Adaptability to High-Dimensional Data: TF-IDF generates high-dimensional features, and MLP is leverages the weighted terms to distinguish between sarcastic and non-sarcastic expressions.

13 of 22

OUR RESULTS

03

14 of 22

Successful models for each language :

  • Count Vectorizer and MLP Classifier
  • TF-IDF Vectorizer and MLP Classifier
  • TF-IDF Vectorizer and Random Forest Classifier

TAMIL

MALAYALAM

  • Count Vectorizer and MLP Classifier
  • Count Vectorizer with Logistic Regression
  • TF-IDF and MLP Classifier

15 of 22

Metric values obtained :

16 of 22

Our results

  • Cross-linguistic sarcasm detection in Tamil and Malayalam yielded high rankings for our team (SSNCSE1): 2nd in Tamil Language and 1st in Malayalam Language.
  • The validation accuracy for Tamil ranged from 0.72 to 0.78, with F1-scores between 0.73 and 0.77. In Malayalam, accuracy ranged from 0.72 to 0.85, with F1-scores between 0.60 and 0.77.

17 of 22

INFERENCES

04

18 of 22

Our study contribution

  • Challenges such as language diversity, code-mixing, and class inequality were effectively addressed by the proposed approaches.
  • The study emphasizes the universality of these challenges in both Tamil and Malayalam, highlighting the need for multilingual techniques in sentiment analysis.

19 of 22

Future Scope

  • Findings contribute to cross-linguistic sarcasm detection, with implications for practical applications like sentiment-driven content analysis and cyberbullying identification.
  • The study underscores the importance of flexible models capable of understanding language nuances in online interactions.

20 of 22

Future Scope

  • Real-time Application: Develop real-time applications for dynamic monitoring of sarcasm in online content, offering immediate insights into sentiment dynamics on platforms like YouTube.
  • Multimodal Analysis: Extend the study to include multimodal features, such as analyzing sarcasm in conjunction with visual elements in YouTube videos, for a more comprehensive understanding.

21 of 22

REFERENCES

  • [1]  Sharma, D.K.; Singh, B.; Agarwal, S.; Pachauri, N.; Alhussan, A.A.; Abdallah, H.A. Sarcasm Detection over Social Media Platforms Using Hybrid Ensemble Model with Fuzzy Logic. Electronics 2023, 12, 937. URL:https://doi.org/10.3390/electronics12040937
  • [2]  Eke, C.I., Norman, A.A., Liyana Shuib et al. Sarcasm identification in textual data: system- atic review, research challenges and open directions. Artif Intell Rev 53, 4215–4258 (2020). URL:https://doi.org/10.1007/s10462-019-09791-8
  • [3]  Santosh Kumar Bharti, Rajeev Kumar Gupta, Prashant Kumar Shukla, Wesam Atef Hatam- leh, Hussam Tarazi, Stephen Jeswinde Nuagah, ”Multimodal Sarcasm Detection: A Deep Learning Approach”, Wireless Communications and Mobile Computing, vol. 2022, Article ID 1653696, 10 pages, 2022. URL:https://doi.org/10.1155/2022/1653696
  • [4]  Sarsam, S. M., Al-Samarraie, H., Alzahrani, A. I., Wright, B. (2020). Sarcasm detection using machine learning algorithms in Twitter: A systematic review. International Journal of Market Research, 62(5), 578–598. URL:https://doi.org/10.1177/1470785320921779
  • [5]  W.Wijaya,I.M.MurwantaraandA.R.Mitra,”ASimplifiedMethodtoIdentifytheSarcastic Elements of Bahasa Indonesia in Youtube Comments,” 2020 8th International Conference on Information and Communication Technology (ICoICT), Yogyakarta, Indonesia, 2020, pp. 1-6, doi: 10.1109/ICoICT49345.2020.9166269. URL:https://ieeexplore.ieee.org/stamp/s- tamp.jsp?tp=arnumber=9166269isnumber=9166148
  • [6]  S. Rendalkar and C. Chandankhede, ”Sarcasm Detection of Online Comments Us- ing Emotion Detection,” 2018 International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 2018, pp. 1244-1249, doi: 10.1109/ICIRCA.2018.8597368. URL:https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=arnum- ber=8597368isnumber=8596764
  • [7] T. Jain, N. Agrawal, G. Goyal and N. Aggrawal, ”Sarcasm detection of tweets: A compar- ative study,” 2017 Tenth International Conference on Contemporary Computing (IC3), Noida, India, 2017, pp. 1-6, doi: 10.1109/IC3.2017.8284317. URL:https://ieeexplore.ieee.org/s- tamp/stamp.jsp?tp=arnumber=8284317isnumber=8284279
  • [8] Chakravarthi, B.R. Hope speech detection in YouTube comments. Soc. Netw. Anal. Min. 12, 75 (2022). https://doi.org/10.1007/s13278-022-00901-z
  • [9] Chakravarthi, B., Hande, A., Ponnusamy, R., Kumaresan, P., & Asoka Chakravarthi, R. (2022). How can we detect Homophobia and Transphobia? Experiments in a multilingual code-mixed setting for social media governance. International Journal of Information Management Data Insights, 2, 100119. https://doi.org/10.1016/j.jjimei.2022.100119

22 of 22

THANK YOU

Credits: This presentation template was created by Slidesgo, including icons by Flaticon, and infographics & images by Freepik