TY - JOUR
T1 - Exploring Action Centers for Temporal Action Localization
AU - Xia, Kun
AU - Wang, Le
AU - Shen, Yichao
AU - Zhou, Sanpin
AU - Hua, Gang
AU - Tang, Wei
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - Temporal action localization aims at detecting the temporal intervals of human actions in untrimmed videos. Most previous methods rely on locating and matching the start and end times of actions. However, action boundaries are ambiguous and uncertain in nature, which leads to inaccurate action localization and a lot of false positives. In this paper, we introduce a new framework for temporal action localization. It explicitly models temporal action centers to reduce unreliable action detection results caused by ambiguous action boundaries. Since action centers are highly related to semantic actions, they can be detected more reliably than the conventional action boundaries. As a result, our framework can exclude false positives and promote high-quality proposals. Based on action centers, we propose a triplet feature fusion mechanism. It performs neural message passing among the boundaries and the center as well as contextual regions outside of the proposal to enrich its representation. In addition, we introduce a centerness scoring method to suppress proposals deviating from the centers of action instances. Consequently, our network can retrieve high-quality action proposals and locate actions more precisely. Experimental results show our method outperforms state-of-the-art methods on the THUMOS14 and ActivityNet v1.3 datasets.
AB - Temporal action localization aims at detecting the temporal intervals of human actions in untrimmed videos. Most previous methods rely on locating and matching the start and end times of actions. However, action boundaries are ambiguous and uncertain in nature, which leads to inaccurate action localization and a lot of false positives. In this paper, we introduce a new framework for temporal action localization. It explicitly models temporal action centers to reduce unreliable action detection results caused by ambiguous action boundaries. Since action centers are highly related to semantic actions, they can be detected more reliably than the conventional action boundaries. As a result, our framework can exclude false positives and promote high-quality proposals. Based on action centers, we propose a triplet feature fusion mechanism. It performs neural message passing among the boundaries and the center as well as contextual regions outside of the proposal to enrich its representation. In addition, we introduce a centerness scoring method to suppress proposals deviating from the centers of action instances. Consequently, our network can retrieve high-quality action proposals and locate actions more precisely. Experimental results show our method outperforms state-of-the-art methods on the THUMOS14 and ActivityNet v1.3 datasets.
KW - Temporal action detection
KW - temporal action localization
KW - temporal action proposal generation
UR - https://www.scopus.com/pages/publications/85149464310
U2 - 10.1109/TMM.2023.3252176
DO - 10.1109/TMM.2023.3252176
M3 - 文章
AN - SCOPUS:85149464310
SN - 1520-9210
VL - 25
SP - 9425
EP - 9436
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -