TY - JOUR
T1 - Tracking beyond Detection
T2 - Learning a Global Response Map for End-To-End Multi-Object Tracking
AU - Wan, Xingyu
AU - Cao, Jiakai
AU - Zhou, Sanping
AU - Wang, Jinjun
AU - Zheng, Nanning
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2021
Y1 - 2021
N2 - Most of the existing Multi-Object Tracking (MOT) approaches follow the Tracking-by-Detection and Data Association paradigm, in which objects are firstly detected and then associated in the tracking process. In recent years, deep neural network has been utilized to obtain more discriminative appearance features for cross-frame association, and noticeable performance improvement has been reported. On the other hand, the Tracking-by-Detection framework is yet not completely end-To-end, which leads to huge computation and limited performance especially in the inference (tracking) process. To address this problem, we present an effective end-To-end deep learning framework which can directly take image-sequence/video as input and output the located and tracked objects of learned types. Specifically, a novel global response network is learned to project multiple objects in the image-sequence/video into a continuous response map, and the trajectory of each tracked object can then be easily picked out. The overall process is similar to how a detector inputs an image and outputs the bounding boxes of each detected object. Experimental results based on the MOT16 and MOT17 benchmarks show that our proposed on-line tracker achieves state-of-The-Art performance on several tracking metrics.
AB - Most of the existing Multi-Object Tracking (MOT) approaches follow the Tracking-by-Detection and Data Association paradigm, in which objects are firstly detected and then associated in the tracking process. In recent years, deep neural network has been utilized to obtain more discriminative appearance features for cross-frame association, and noticeable performance improvement has been reported. On the other hand, the Tracking-by-Detection framework is yet not completely end-To-end, which leads to huge computation and limited performance especially in the inference (tracking) process. To address this problem, we present an effective end-To-end deep learning framework which can directly take image-sequence/video as input and output the located and tracked objects of learned types. Specifically, a novel global response network is learned to project multiple objects in the image-sequence/video into a continuous response map, and the trajectory of each tracked object can then be easily picked out. The overall process is similar to how a detector inputs an image and outputs the bounding boxes of each detected object. Experimental results based on the MOT16 and MOT17 benchmarks show that our proposed on-line tracker achieves state-of-The-Art performance on several tracking metrics.
KW - Multi-object tracking
KW - deep neural network
KW - global response map
UR - https://www.scopus.com/pages/publications/85115669608
U2 - 10.1109/TIP.2021.3113169
DO - 10.1109/TIP.2021.3113169
M3 - 文章
C2 - 34550886
AN - SCOPUS:85115669608
SN - 1057-7149
VL - 30
SP - 8222
EP - 8235
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -