BS-GSA: Speech Emotion Recognition via Blend-Sample Empowered Global & Spectral Attentive Paradigm

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Speech emotion recognition (SER) is an indispensable part of human intention understanding in human-computer interaction systems. In this paper, we propose a novel Blend-Sample empowered Global & Spectral Attentive (BS-GSA) paradigm for more robust global and spectral emotional feature learning. The Global & Spectral Attentive (GSA) model captures global and positional information with attentive model design, devotes to better learning emotional representation from long-term dependencies and spectral characteristics. Besides, we propose a blend-sample (BS) augmentation algorithm for equilibrated emotion samples and better learning of inconspicuous frame-level emotion clues. Experiments conducted on the IEMOCAP dataset demonstrate that the proposed paradigm outperforms existing methods, reaching state-of-the-art performance.

Original languageEnglish
Title of host publicationProceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages155-158
Number of pages4
ISBN (Electronic)9798350308693
DOIs
StatePublished - 2023
Event15th International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023 - Jiangsu, China
Duration: 2 Nov 20234 Nov 2023

Publication series

NameProceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023

Conference

Conference15th International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023
Country/TerritoryChina
CityJiangsu
Period2/11/234/11/23

Keywords

  • attention mechanism
  • convolutional neural network
  • human-computer interaction
  • speech emotion recognition

Fingerprint

Dive into the research topics of 'BS-GSA: Speech Emotion Recognition via Blend-Sample Empowered Global & Spectral Attentive Paradigm'. Together they form a unique fingerprint.

Cite this