Machine Learning Approach for Emotion Recognition in Speech

Martin   Gjoreski; Hristijan   Gjoreski; Andrea   Kulakov

Contact Editors Europe, Africa:
Matjaz Gams
N. and S. America:
Karthick Gunasekaran
Asia, Australia:
Vinay Singh
Overview papers:
Maria Ganzha
Wiesław Pawlowski
Aleksander Denisiuk Abstacting / Indexing

Informatica is surveyed by:

ACM Digital Library
Citeseer
COBISS
Compendex
Computer & Information Systems Abstracts
Computer Database
Computer Science Index
dLib.si
DBLP Computer Science Bibliography
Directory of Open Access Journals
Google Scholar
InfoTrac OneFile
Inspec
Linguistic and Language Behaviour Abstracts
Mathematical Reviews, MatSciNet, MatSci on SilverPlatter and Current Mathematical Publications
Scopus Publishing

Informatica is published by:

Support

Informatica is supported by:

ACM Slovenia
Slovenian Society for Pattern Recognition
Slovenian Artificial Intelligence Society
Slovenian Society for Cognitive Science
Slovenian Society of Mathematicians, Physicists and Astronomers
Automatic Control Society of Slovenia
Slovenian Academy of Engineering
International Federation for Information Processing

Journal Help

User

Journal Content Search
Browse

Information

Notifications

About The Authors

Martin Gjoreski

Hristijan Gjoreski

Andrea Kulakov

Support & Indexing

Machine Learning Approach for Emotion Recognition in Speech

Martin Gjoreski, Hristijan Gjoreski, Andrea Kulakov

Abstract

This paper presents a machine learning approach to automatic recognition of human emotions from
speech. The approach consists of three steps. First, numerical features are extracted from the sound
database by using audio feature extractor. Then, feature selection method is used to select the most
relevant features. Finally, a machine learning model is trained to recognize seven universal emotions:
anger, fear, sadness, happiness, boredom, disgust and neutral. A thorough ML experimental analysis is
performed for each step. The results showed that 300 (out of 1582) features, as ranked by the gain ratio,
are sufficient for achieving 86% accuracy when evaluated with 10 fold cross-validation. SVM achieved
the highest accuracy when compared to KNN and Naive Bayes. We additionally compared the accuracy
of the standard SVM (with default parameters) and the one enhanced by Auto-WEKA (optimized
algorithm parameters) using the leave-one-speaker-out technique. The results showed that the SVM
enhanced with Auto-WEKA achieved significantly better accuracy than the standard SVM, i.e., 73% and
77% respectively. Finally, the results achieved with the 10 fold cross-validation are comparable and
similar to the ones achieved by a human, i.e., 86% accuracy in both cases. Even more, low energy
emotions (boredom, sadness and disgust) are better recognized by our machine learning approach
compared to the human.

Full Text:

PDF

This work is licensed under a Creative Commons Attribution 3.0 License.

Informatica is financially supported by the Slovenian research agency from the Call for co-financing of scientific periodical publications.

Webmaster: Mario Konecki

Username
Password
Remember me