Intelligent Music Content Generation Model Based on Multimodal Situational Sentiment Perception

Yingshi Jiang; Zuodong Sun

doi:10.31449/inf.v49i5.6846

Contact Editors Europe, Africa:
Matjaz Gams
N. and S. America:
Karthick Gunasekaran
Asia, Australia:
Vinay Singh
Overview papers:
Maria Ganzha
Wiesław Pawlowski
Aleksander Denisiuk Abstacting / Indexing

Informatica is surveyed by:

ACM Digital Library
Citeseer
COBISS
Compendex
Computer & Information Systems Abstracts
Computer Database
Computer Science Index
dLib.si
DBLP Computer Science Bibliography
Directory of Open Access Journals
Google Scholar
InfoTrac OneFile
Inspec
Linguistic and Language Behaviour Abstracts
Mathematical Reviews, MatSciNet, MatSci on SilverPlatter and Current Mathematical Publications
Scopus Publishing

Informatica is published by:

Support

Informatica is supported by:

ACM Slovenia
Slovenian Society for Pattern Recognition
Slovenian Artificial Intelligence Society
Slovenian Society for Cognitive Science
Slovenian Society of Mathematicians, Physicists and Astronomers
Automatic Control Society of Slovenia
Slovenian Academy of Engineering
International Federation for Information Processing

Journal Help

User

Journal Content Search
Browse

Information

Notifications

About The Authors

Yingshi Jiang

China

Zuodong Sun

China

Support & Indexing

Intelligent Music Content Generation Model Based on Multimodal Situational Sentiment Perception

Yingshi Jiang, Zuodong Sun

Abstract

To further examine the interrelationship between music, emotion, and scene, as well as to furnish novel technical assistance for music creation, the study devised a multimodal sentiment analysis model for auditory and visual features with deep learning. Based on this model, a new music content generation model was proposed, which improved upon the traditional Transformer architecture. The experimental results indicated that the minimum values of mean absolute error, root mean square error and mean absolute percentage error of the research-designed multimodal sentiment analysis architecture were 0.149, 0.166, and 0.140 respectively. The maximum value of R-squared was 0.961. The experimentally constructed multimodal sentiment analysis dataset effectively improved the performance of the model. The model performed well on Precision-Recall curve, receiver operating characteristic curve. The sentiment recognition accuracy was up to 0.98, and the recognition efficiency was high. Meanwhile, the music generated by the improved Transformer structure was closest to the dataset in terms of pitch and melody variation, with a minimum difference margin of 0.86%. The generated music performed better in terms of smoothness, coherence, and percentage of completeness. Using this model for music generation, the highest values of hit rate and normalized discounted cumulative gain could be 93.984% and 91.566%. The mean inverse rank could be up to 0.89. This study deepens the mechanism of music emotion generation, captures the emotion and context of music more accurately, and promotes the development of the fields of emotion computing and sentiment recognition.

Full Text:

PDF

DOI: https://doi.org/10.31449/inf.v49i5.6846

This work is licensed under a Creative Commons Attribution 3.0 License.

Informatica is financially supported by the Slovenian research agency from the Call for co-financing of scientific periodical publications.

Webmaster: Mario Konecki

Username
Password
Remember me