Text Based Emotion Detection (Banglish, Bangla & English)

Research Supervisor
Journal Title
Journal Title
Journal ISSN
Volume Title
Emotion detection from text data has become essential to understand people's emotions at a fine-grained level for system enhancement. Many applications in numerous industries can be benefited from this study, including data mining, neuroscience, e-learning, cognitive science, information filtering systems, understanding expressed emotions, human-computer interaction, recommendation systems, and psychology. The rich text source available online in the form of news articles, customer reviews, and social media comments/posts can help in text mining and emotions. Also, Bengali is the sixth most spoken language globally, and there is less research done for text-based emotion detection in Bengali because of a lack of resources. It is already challenging for Bangla texts due to the lack of labeled data. Being a multilingual society, many native Bengali speakers write their comments or posts in transliterated Bengali on social media platforms. Almost all the Bengali and transliterated Bengali texts on social media are unorganized and must be organized and labeled properly. The unavailability of standard classifying models and lack of annotated corpus in the transliterated Bengali left this area of research still an exploring region. So developing a solid model for analyzing emotion from Bengali and Transliterated Bengali words from social media has become a necessity to strengthen the field of Bengali text-based sentiment analysis. This project proposes the step-by-step development of a low resource emotional corpus consisting of Transliterated Bangla (Banglish) data from social media platforms and approaches to Machine Learning (ML), Deep Neural Network (DNN), and Transformer based models to classify Ekman's six emotions from collected Banglish data and existing Bangla and English data. The BEmoD, ISEAR, IMDB, and SST2 dataset along with the newly developed Transliterated Bangla (Banglish) dataset was used to train the models. The Cohen's Kappa score of the Transliterated Bangla dataset is 89% which shows the agreement among annotators. Experimental results show that CNN + BiLSTM (FastText) outperforms all the DNN approaches with 67.54% accuracy, and Random Forest (Tf-Idf) outperforms all the ML approaches with 67.95% accuracy on Banglish data. Whereas BEmoD Bangla-BERT outperforms all the approaches with 62.19% accuracy. Bert model was trained using the SST2, IMDB, and ISEAR dataset with respective accuracy of 94.89%, 94.98%, and 72.17%.
Department Name
Electrical and Computer Engineering
North South University
Printed Thesis