Department of Computer Science
Advisers: Deepak Kumar, Manar Darwish
Talk: May 7 @ 10:30 in Hilles 110
Sentiment analysis is an emerging area of application fueled by the increase of public participation in online social media. Much work has been done on sentiment analysis in English while less work has been done on other languages like Mandarin and Arabic. Arabic is spoken by hundreds of millions of people in over twenty countries. Modern Standard Arabic (MSA) is used online mostly by newspapers and other official sources. However, social media and blogs used by individuals are typically in Dialect Arabic (DA). My Senior Thesis work has been focused on exploring ways to increase the accuracy of automated sentiment analysis in Egyptian Arabic through using the specific features of Arabic.
I found that the baseline algorithm makes the most mistakes in classifying tweets that carry a sentiment as neutral tweets. Using Minimum Edit Distance (MED) and ISRI Arabic stemmer, I was able to decrease the error of the baseline algorithm by 31% without having to add any new entries to the lexicon. My approach has allowed me to not only get over the challenge of different morphological forms but also misspelling and informal writing. While I cannot empirically compare it to results by other authors as I am using a different data set, my approach reaches an accuracy of 78% which has an improvement of 14.7% over the baseline.