Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Multi-Object Tracking (MOT) remains a vital component of intelligent video analysis, which aims to locate targets and maintain a consistent identity for each target throughout a video sequence. Existing works usually learn a discriminative feature representation, such as motion and appearance, to associate the detections across frames, which are easily affected by mutual occlusion and background clutter in practice. In this paper, we propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets, so as to achieve robust data association in the tracking process. For the detections without being associated, we design a novel single-shot feature learning module to extract discriminative features of each detection, which can efficiently associate targets between adjacent frames. For the tracklets being lost several frames, we design a novel multi-shot feature learning module to extract discriminative features of each tracklet, which can accurately refind these lost targets after a long period. Once equipped with a simple data association logic, the resulting VisualTracker can perform robust MOT based on the single-shot and multi-shot feature representations. Extensive experimental results demonstrate that our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.

Original languageEnglish
Pages (from-to)9515-9526
Number of pages12
JournalIEEE Transactions on Multimedia
Volume26
DOIs
StatePublished - 2024

Keywords

  • Multi-object tracking
  • data association
  • discriminative feature learning

Fingerprint

Dive into the research topics of 'Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking'. Together they form a unique fingerprint.

Cite this