Bidirectional Convolutional Recurrent Sparse Network (BCRSN): An Efficient Model for Music Emotion Recognition

Research output: Contribution to journalArticlepeer-review

93 Scopus citations

Abstract

Music emotion recognition, which enables effective and efficient music organization and retrieval, is a challenging subject in the field of music information retrieval. In this paper, we propose a new bidirectional convolutional recurrent sparse network (BCRSN) for music emotion recognition based on convolutional neural networks and recurrent neural networks. Our model adaptively learns the sequential-information-included affect-salient features (SII-ASF) from the 2-D time-frequency representation (i.e., spectrogram) of music audio signals. By combining feature extraction, ASF selection, and emotion prediction, the BCRSN can achieve continuous emotion prediction of audio files. To reduce the high computational complexity caused by the numerical-type ground truth, we propose a weighted hybrid binary representation (WHBR) method that converts the regression prediction process into a weighted combination of multiple binary classification problems. We test our method on two benchmark databases, that is, the Database for Emotional Analysis in Music and MoodSwings Turk. The results show that the WHBR method can greatly reduce the training time and improve the prediction accuracy. The extracted SII-ASF is robust to genre, timbre, and noise variation and is sensitive to emotion. It achieves significant improvement compared to the best performing feature sets in MediaEval 2015. Meanwhile, extensive experiments demonstrate that the proposed method outperforms the state-of-the-art methods.

Original languageEnglish
Article number8721099
Pages (from-to)3150-3163
Number of pages14
JournalIEEE Transactions on Multimedia
Volume21
Issue number12
DOIs
StatePublished - Dec 2019

Keywords

  • bidirectional convolutional recurrent sparse network
  • Lasso regression
  • long short-term memory
  • Music emotion recognition
  • sequential-information-included affect-salient features selection

Fingerprint

Dive into the research topics of 'Bidirectional Convolutional Recurrent Sparse Network (BCRSN): An Efficient Model for Music Emotion Recognition'. Together they form a unique fingerprint.

Cite this