TY - GEN
T1 - A Hierarchical Speech Emotion Classification Framework based on Joint Triplet-Center Loss
AU - Yang, Xinyu
AU - Xia, Xiaojing
AU - Dong, Yizhuo
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/10/23
Y1 - 2020/10/23
N2 - Automatic speech emotion recognition task is crucial to the development of human-computer interaction systems. However, the ambiguity of emotion categories and the subjectivity of human annotations make it hard to extract discriminative emotional features and improve the classification accuracy. In this paper, we propose a Joint Triplet-Center Loss based hierarchical learning method. On the one hand, the proposed Joint Triplet-Center Loss function can learn discriminative emotional features through reducing the intra-class distance and increasing the inter-class distance. On the other hand, the hierarchical learning method can enhance the stability of the model by considering the consistency of annotations. The experimental results show that our proposed method has obvious performance improvement compared with previous works, and gets better generalization performance.
AB - Automatic speech emotion recognition task is crucial to the development of human-computer interaction systems. However, the ambiguity of emotion categories and the subjectivity of human annotations make it hard to extract discriminative emotional features and improve the classification accuracy. In this paper, we propose a Joint Triplet-Center Loss based hierarchical learning method. On the one hand, the proposed Joint Triplet-Center Loss function can learn discriminative emotional features through reducing the intra-class distance and increasing the inter-class distance. On the other hand, the hierarchical learning method can enhance the stability of the model by considering the consistency of annotations. The experimental results show that our proposed method has obvious performance improvement compared with previous works, and gets better generalization performance.
KW - annotations
KW - discriminative emotional features
KW - Joint Triplet-Center Loss
KW - speech emotion recognition
UR - https://www.scopus.com/pages/publications/85101140371
U2 - 10.1109/ICSIP49896.2020.9339353
DO - 10.1109/ICSIP49896.2020.9339353
M3 - 会议稿件
AN - SCOPUS:85101140371
T3 - 2020 IEEE 5th International Conference on Signal and Image Processing, ICSIP 2020
SP - 751
EP - 756
BT - 2020 IEEE 5th International Conference on Signal and Image Processing, ICSIP 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Signal and Image Processing, ICSIP 2020
Y2 - 23 October 2020 through 25 October 2020
ER -