TY - GEN
T1 - Multiple Object Tracking by Trajectory Map Regression with Temporal Priors Embedding
AU - Wan, Xingyu
AU - Zhou, Sanping
AU - Wang, Jinjun
AU - Meng, Rongye
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/10/17
Y1 - 2021/10/17
N2 - Prevailing Multiple Object Tracking (MOT) works following the Tracking-by-Detection (TBD) paradigm pay most attention to either object detection in a first step or data association in a second step. In this paper, we approach the MOT problem from a different perspective by directly obtaining the embedded spatial-temporal information of trajectories from raw video data. For the purpose we propose a joint trajectory locating and attributes encoding framework for real-time, on-line MOT. We firstly introduce a trajectory attribute representation scheme designed for each tracked target (instead of object) where the extracted Trajectory Map (TM) encodes the spatial-temporal attributes of a trajectory across a window of consecutive video frames. Next we present a Temporal Priors Embedding (TPE) methodology to infer these attributes with a logical reasoning strategy based on long-term feature dynamics. The proposed MOT framework projects multiple attributes of tracked targets, e.g., presence, enter/exit, location, scale, motion, etc. into a continuous TM to perform one-shot regression for real-time MOT. Experimental results show that, our proposed video-based method runs at 33 FPS and is more accurate and robust as compared to the detection-based tracking methods and a few other State-of-the- Art (SOTA) approaches on MOT16/17/20 benchmarks.
AB - Prevailing Multiple Object Tracking (MOT) works following the Tracking-by-Detection (TBD) paradigm pay most attention to either object detection in a first step or data association in a second step. In this paper, we approach the MOT problem from a different perspective by directly obtaining the embedded spatial-temporal information of trajectories from raw video data. For the purpose we propose a joint trajectory locating and attributes encoding framework for real-time, on-line MOT. We firstly introduce a trajectory attribute representation scheme designed for each tracked target (instead of object) where the extracted Trajectory Map (TM) encodes the spatial-temporal attributes of a trajectory across a window of consecutive video frames. Next we present a Temporal Priors Embedding (TPE) methodology to infer these attributes with a logical reasoning strategy based on long-term feature dynamics. The proposed MOT framework projects multiple attributes of tracked targets, e.g., presence, enter/exit, location, scale, motion, etc. into a continuous TM to perform one-shot regression for real-time MOT. Experimental results show that, our proposed video-based method runs at 33 FPS and is more accurate and robust as compared to the detection-based tracking methods and a few other State-of-the- Art (SOTA) approaches on MOT16/17/20 benchmarks.
KW - multi-object tracking
KW - occlusion-aware radius
KW - temporal priors embedding
KW - trajectory map
UR - https://www.scopus.com/pages/publications/85119354880
U2 - 10.1145/3474085.3475304
DO - 10.1145/3474085.3475304
M3 - 会议稿件
AN - SCOPUS:85119354880
T3 - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
SP - 1377
EP - 1386
BT - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
T2 - 29th ACM International Conference on Multimedia, MM 2021
Y2 - 20 October 2021 through 24 October 2021
ER -