TY - GEN
T1 - Self-adaptive Context and Modal-interaction Modeling For Multimodal Emotion Recognition
AU - Yang, Haozhe
AU - Gao, Xianqiang
AU - Wu, Jianlong
AU - Gan, Tian
AU - Ding, Ning
AU - Jiang, Feijun
AU - Nie, Liqiang
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - The multimodal emotion recognition in conversation task aims to predict the emotion label for a given utterance with its context and multiple modalities. Existing approaches achieve good results but also suffer from the following two limitations: 1) lacking modeling of diverse dependency ranges, i.e., long, short, and independent context-specific representations and without consideration of the different recognition difficulty for each utterance; 2) consistent treatment of the contribution for various modalities. To address the above challenges, we propose the Self-adaptive Context and Modal-interaction Modeling (SCMM) framework. We first design the context representation module, which consists of three submodules to model multiple contextual representations. Thereafter, we propose the modal-interaction module, including three interaction submodules to make full use of each modality. Finally, we come up with a self-adaptive path selection module to select an appropriate path in each module and integrate the features to obtain the final representation. Extensive experiments under four settings on three multimodal datasets, including IEMOCAP, MELD, and MOSEI, demonstrate that our proposed method outperforms the state-of-the-art approaches.
AB - The multimodal emotion recognition in conversation task aims to predict the emotion label for a given utterance with its context and multiple modalities. Existing approaches achieve good results but also suffer from the following two limitations: 1) lacking modeling of diverse dependency ranges, i.e., long, short, and independent context-specific representations and without consideration of the different recognition difficulty for each utterance; 2) consistent treatment of the contribution for various modalities. To address the above challenges, we propose the Self-adaptive Context and Modal-interaction Modeling (SCMM) framework. We first design the context representation module, which consists of three submodules to model multiple contextual representations. Thereafter, we propose the modal-interaction module, including three interaction submodules to make full use of each modality. Finally, we come up with a self-adaptive path selection module to select an appropriate path in each module and integrate the features to obtain the final representation. Extensive experiments under four settings on three multimodal datasets, including IEMOCAP, MELD, and MOSEI, demonstrate that our proposed method outperforms the state-of-the-art approaches.
UR - https://www.scopus.com/pages/publications/85175460982
U2 - 10.18653/v1/2023.findings-acl.390
DO - 10.18653/v1/2023.findings-acl.390
M3 - 会议稿件
AN - SCOPUS:85175460982
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 6267
EP - 6281
BT - Findings of the Association for Computational Linguistics, ACL 2023
PB - Association for Computational Linguistics (ACL)
T2 - Findings of the Association for Computational Linguistics, ACL 2023
Y2 - 9 July 2023 through 14 July 2023
ER -