TY - GEN
T1 - Singing Melody Extraction Based on Combined Frequency-Temporal Attention and Attentional Feature Fusion with Self-Attention
AU - Qi, Xi
AU - Tian, Lihua
AU - Li, Chen
AU - Song, Hui
AU - Yan, Jiahui
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The main melody extraction of polyphonic music is a challenging task for music information retrieval. Traditional convolutional neural networks, recurrent neural networks have effectively improved this task. In recent years, with the development of attention mechanism in neural networks, the frequency and time attention information of audio has been fully exploited, and the amplitude properties of audio can also be better integrated with a good fusion module. This paper improves the frequency-temporal attention based on others' prior work. By extracting the attention information with the frequency-temporal attention and performing additive fusion of features, the combined frequency-temporal attention is obtained. Then we apply attentional feature fusion based on multi-scale channel attention, and finally the temporal dependencies are learned through the self-attention module. Our experimental results on four datasets demonstrate that our model outperforms existing models.
AB - The main melody extraction of polyphonic music is a challenging task for music information retrieval. Traditional convolutional neural networks, recurrent neural networks have effectively improved this task. In recent years, with the development of attention mechanism in neural networks, the frequency and time attention information of audio has been fully exploited, and the amplitude properties of audio can also be better integrated with a good fusion module. This paper improves the frequency-temporal attention based on others' prior work. By extracting the attention information with the frequency-temporal attention and performing additive fusion of features, the combined frequency-temporal attention is obtained. Then we apply attentional feature fusion based on multi-scale channel attention, and finally the temporal dependencies are learned through the self-attention module. Our experimental results on four datasets demonstrate that our model outperforms existing models.
KW - feature fusion
KW - music information retrieval
KW - self-attention
KW - singing melody extraction
UR - https://www.scopus.com/pages/publications/85147539199
U2 - 10.1109/ISM55400.2022.00050
DO - 10.1109/ISM55400.2022.00050
M3 - 会议稿件
AN - SCOPUS:85147539199
T3 - Proceedings - 2022 IEEE International Symposium on Multimedia, ISM 2022
SP - 220
EP - 227
BT - Proceedings - 2022 IEEE International Symposium on Multimedia, ISM 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th IEEE International Symposium on Multimedia, ISM 2022
Y2 - 5 December 2022 through 7 December 2022
ER -