Cross-Modal Sentiment Analysis on Social Media using Improved Nonverbal Representation Learning and GHRNN Fusion

Minglun Xue, Dongyang Wang

Abstract


Traditional sentiment analysis methods mainly focus on textual data, while human emotions are multidimensional, usually related to sound, body language, etc. The use of multi-modal data can provide a deeper and broader understanding of human sentiment conveyance. To address the above issues, a method for extracting and analyzing emotional information features based on improved nonverbal representation learning networks and multi-modal data (Improved Nonverbal Representation Learning Networks and MD, INPNRLN-MD) is proposed. On this basis, an improved multi-modal data fusion method based on Gated Hierarchical Recurrent Neural Network and Cross-Modal Attention (GHRNN-CMA) is designed for the MD fusion part. Compared with traditional baselines, INPNRLN-MD extracts text features through the BERT model and utilizes ELN to process audio and video data, which can more effectively capture emotional information in multi-modal data. The cross-modal attention mechanism of GHRNN-CMA can enhance the interaction between modalities and improve the accuracy of emotion information recognition. Finally, the performance of the model is validated on the CMU-OSI and CMU-MCSEI datasets using indicators such as F1 value, Pearson correlation, mean absolute error, and second-order accuracy. During the training process, the study used a single NVIDIA GTX TITAN X GPU for testing, with 12 VRAM and a batch size of only 32 to converge. The inference stage has a relatively light computational load and can be deployed on ordinary cloud servers or edge devices. The research results show that compared with mainstream algorithms, the emotion information feature extraction and analysis method based on improved nonverbal representation learning network and multi-modal data performs the best, with F1 score, Pearson correlation, average absolute error, and second-order accuracy reaching 83.06/85.12, 0.803, 0.696, and 83.17/85.23, respectively. The average absolute error, Pearson correlation, F1 score, and second-order accuracy of the improved multi-modal data fusion method have been improved by 1.0%, 14.67%, 3.1%, and 3.3% respectively compared to the latest method. The above results indicate that research methods are helpful in perceiving and analyzing human emotions, which is beneficial for understanding and predicting human behavior in the future, and is of great significance for maintaining social relationships and improving social governance.


Full Text:

PDF

References


References

Hebbi C, Mamatha H. Comprehensive Dataset Building and Recognition of Isolated Handwritten Kannada Characters Using Machine Learning Models. Artificial Intelligence and Applications, 2023, 1(3):179-190.

Wang J, Yue K, Duan, L. Models and Techniques for Domain Relation Extraction: A Survey. Journal of Data Science and Intelligent Systems, 2023, 3(1): 16-25.

Brady W J, Jackson J C, Lindstrom B C M J. Algorithm-mediated social learning in online social networks. Trends in cognitive sciences, 2023, 27(10):947-960.

Gen U, Surer E. ClickbaitTR: Dataset for clickbait detection from Turkish news sites and social media with a comparative analysis via machine learning algorithms. Journal of Information Science, 2023, 49(2):480-499.

Chung M. What's in the black box? How algorithmic knowledge promotes corrective and restrictive actions to counter misinformation in the USA, the UK, South Korea and Mexico. Internet Research: Electronic Networking Applications and Policy, 2023, 33(5):1971-1989.

Nagaraj S V. Living with algorithms: agency and user culture in Costa Rica. Computing reviews, 2023, 64(11):268-269.

Wykes T, Guha M. Modern media and mental health: help or hindrance? Journal of mental health (Abingdon, England), 2022, 31(6):735-737.

Lei Y, Cao H. Audio-Visual Emotion Recognition with Preference Learning Based on Intended and Multi-Modal Perceived Labels. IEEE transactions on affective computing, 2023, 14(4):2954-2969.

Zhu X, Guo C, Feng H, Huang Y, Feng Y, Wang X, Wang R. A Review of Key Technologies for Emotion Analysis Using multi-modal Information. Cognitive Computation, 2024, 16(4):1504-1530.

Chen L, Wang K, Li M, Wu M, Pedrycz W, Hirota K. K -Means Clustering-Based Kernel Canonical Correlation Analysis for multi-modal Emotion Recognition in Human–Robot Interaction. IEEE Transactions on Industrial Electronics, 2023 70(1);1016-1024.

Mai S, Zeng Y, Hu Z H. Hybrid Contrastive Learning of Tri-Modal Representation for multi-modal Sentiment Analysis. IEEE transactions on affective computing, 2023, 14(3):2276-2289.

Chiorrini A, Diamantini C, Storti P E. An emotion-aware search engine for multimedia content based on deep learning algorithms. International Journal of Computer Applications in Technology, 2023, 73(2):130-139.

Fang Z, Qian Y, Su C, Miao Y, Li Y. The multi-modal Sentiment Analysis of Online Product Marketing Information Using Text Mining and Big Data. Journal of organizational and end user computing, 2022, 34(Pt.2):451-469.

Gupta S, Singh A, Ranjan J. multi-modal, multiview and multitasking depression detection framework endorsed with auxiliary sentiment polarity and emotion detection. International Journal of System Assurance Engineering and Management, 2023, 14(1):337-352.

Hong C, Zhiquan F, Weina Z L. MAG: a smart gloves system based on multi-modal fusion perception. CCF Transactions on Pervasive Computing and Interaction, 2023, 5(4):411-429.

Qi S, Liu B. A multi-modal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videos. Pattern analysis and applications: PAA, 2023, 26(3):1493-1503.

Tang J, Qin W, Pan Q L S. A Deep multi-modal Fusion and Multitasking Trajectory Prediction Model for Typhoon Trajectory Prediction to Reduce Flight Scheduling Cancellation. journal of systems engineering and electronics, 2024, 35(3):666-678.

Panaiyappan K A, Rajalakshmi M. A multi-modal architecture using Adapt‐HKFCT segmentation and feature‐based chaos integrated deep neural networks (Chaos‐DNN‐SPOA) for contactless biometricpalm vein recognition system. International Journal of Intelligent Systems, 2022, 37(3):1846-1879.

Zhao Y, Zheng Q, Zhu P, Zhang X, Ma W. TUFusion: A Transformer-Based Universal Fusion Algorithm for multi-modal Images. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(3):1712-1725.

Wang H, Feng Z, Guo T Q. MR Lab: Virtual-Reality Fusion Smart Laboratory Based on multi-modal Fusion. International journal of human-computer interaction, 2024, 40(5/8):1975-1988.




DOI: https://doi.org/10.31449/inf.v49i34.9130

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.