A Hierarchical Speech Emotion Classification Framework based on Joint Triplet-Center Loss

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Automatic speech emotion recognition task is crucial to the development of human-computer interaction systems. However, the ambiguity of emotion categories and the subjectivity of human annotations make it hard to extract discriminative emotional features and improve the classification accuracy. In this paper, we propose a Joint Triplet-Center Loss based hierarchical learning method. On the one hand, the proposed Joint Triplet-Center Loss function can learn discriminative emotional features through reducing the intra-class distance and increasing the inter-class distance. On the other hand, the hierarchical learning method can enhance the stability of the model by considering the consistency of annotations. The experimental results show that our proposed method has obvious performance improvement compared with previous works, and gets better generalization performance.

Original languageEnglish
Title of host publication2020 IEEE 5th International Conference on Signal and Image Processing, ICSIP 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages751-756
Number of pages6
ISBN (Electronic)9781728168968
DOIs
StatePublished - 23 Oct 2020
Event5th IEEE International Conference on Signal and Image Processing, ICSIP 2020 - Virtual, Nanjing, China
Duration: 23 Oct 202025 Oct 2020

Publication series

Name2020 IEEE 5th International Conference on Signal and Image Processing, ICSIP 2020

Conference

Conference5th IEEE International Conference on Signal and Image Processing, ICSIP 2020
Country/TerritoryChina
CityVirtual, Nanjing
Period23/10/2025/10/20

Keywords

  • annotations
  • discriminative emotional features
  • Joint Triplet-Center Loss
  • speech emotion recognition

Fingerprint

Dive into the research topics of 'A Hierarchical Speech Emotion Classification Framework based on Joint Triplet-Center Loss'. Together they form a unique fingerprint.

Cite this