Affect-Salient Event Sequences Modelling for Continuous Speech Emotion Recognition Using Connectionist Temporal Classification

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Continuous speech emotion recognition task faces the challenge of delays caused by reaction time, which is inherent in human annotations, and noises caused by non-emotional segments. To settle these, we propose an affect-salient event sequences modelling (ASESM) method based on connectionist temporal classification (CTC). The proposed method treats a sentence's label sequence as a chain of affect-salient event (ASE) states and Null (i.e., non-affect-salient event state), and models a CTC-based convolutional neural network (CNN) to automatically label the sentence's emotional segments with ASE and non-emotional segments with Null. Then, the continuous arousal and valence annotations of each ASE are used to mark the emotional value of the segment which is predicted as the ASE for testing samples. Our method avoids the reaction delay compensation by using events as the target and reduces the impact of noises by using CTC. Experimental results on the RECOLA dataset demonstrate the effectiveness of our method compared to state-of-the-art speech-only methods.

Original languageEnglish
Title of host publication2020 IEEE 5th International Conference on Signal and Image Processing, ICSIP 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages773-778
Number of pages6
ISBN (Electronic)9781728168968
DOIs
StatePublished - 23 Oct 2020
Event5th IEEE International Conference on Signal and Image Processing, ICSIP 2020 - Virtual, Nanjing, China
Duration: 23 Oct 202025 Oct 2020

Publication series

Name2020 IEEE 5th International Conference on Signal and Image Processing, ICSIP 2020

Conference

Conference5th IEEE International Conference on Signal and Image Processing, ICSIP 2020
Country/TerritoryChina
CityVirtual, Nanjing
Period23/10/2025/10/20

Keywords

  • affect responses
  • affect-salient events
  • connectionist temporal classification
  • continuous speech emotion recognition

Fingerprint

Dive into the research topics of 'Affect-Salient Event Sequences Modelling for Continuous Speech Emotion Recognition Using Connectionist Temporal Classification'. Together they form a unique fingerprint.

Cite this