Deep Multimodal K-Fold Model for Emotion and Sentiment Analysis in Figurative Language
23 Pages Posted: 7 Feb 2024 Last revised: 18 Apr 2024
Date Written: February 7, 2024
Abstract
The "Deep multimodal K-fold model" is an approach to machine learning that uses multiple modalities, such as text and images, to analyze emotion and sentiment in Figurative Language. This model uses a K-fold cross-validation method to test how well it works and ensure it can be used in other situations. Using the CMU-MOSEI (Carnegie Mellon University Multimodal Opinion Sentiment and Emotion Intensity) database, we explore sentiment and emotion, considering various aspects like ironic, sarcastic, and subjective sentences in this study. We have four models in our deep learning analysis approach (audio emotion, text emotion, text sentiment, and audio sentiment). We suggested a single-tasking multimodal framework outperforming others to benefit from the interdependence of two related activities (sentiment and emotion). Our experiment was conducted using CNN and LSTM. Specific experimental findings show LSTM perform better than CNN, except for two text and audio sentiment models. We achieved 97% accuracy in the k-Fold Deep Learning Text Emotion and 91% in the k-Fold Deep Learning Audio Emotion in an experiment using the LSTM technique. In text and audio sentiment analysis, we got 93% and 78%, respectively. In conclusion, the Deep multimodal K-fold model is a promising way to analyze emotions and feelings in Figurative Language. It is a robust and reliable tool for this task because it can combine multiple methods and use K-fold cross-validation.
Keywords: Audio Emotion recognition, Audio Sentiment analysis, Text emotion recognition, Text sentiment analysis, LSTM, CNN
Suggested Citation: Suggested Citation