Intelligent Music Content Generation Model Based on Multimodal Situational Sentiment Perception
Abstract
To further examine the interrelationship between music, emotion, and scene, as well as to furnish novel technical assistance for music creation, the study devised a multimodal sentiment analysis model for auditory and visual features with deep learning. Based on this model, a new music content generation model was proposed, which improved upon the traditional Transformer architecture. The experimental results indicated that the minimum values of mean absolute error, root mean square error and mean absolute percentage error of the research-designed multimodal sentiment analysis architecture were 0.149, 0.166, and 0.140 respectively. The maximum value of R-squared was 0.961. The experimentally constructed multimodal sentiment analysis dataset effectively improved the performance of the model. The model performed well on Precision-Recall curve, receiver operating characteristic curve. The sentiment recognition accuracy was up to 0.98, and the recognition efficiency was high. Meanwhile, the music generated by the improved Transformer structure was closest to the dataset in terms of pitch and melody variation, with a minimum difference margin of 0.86%. The generated music performed better in terms of smoothness, coherence, and percentage of completeness. Using this model for music generation, the highest values of hit rate and normalized discounted cumulative gain could be 93.984% and 91.566%. The mean inverse rank could be up to 0.89. This study deepens the mechanism of music emotion generation, captures the emotion and context of music more accurately, and promotes the development of the fields of emotion computing and sentiment recognition.
Full Text:
PDFDOI: https://doi.org/10.31449/inf.v49i5.6846
This work is licensed under a Creative Commons Attribution 3.0 License.