跳到主要导航 跳到搜索 跳到主要内容

Voxel-Based Multi-Scale Transformer Network for Event Stream Processing

  • Southeast University, Nanjing

科研成果: 期刊稿件文章同行评审

14 引用 (Scopus)

摘要

Event cameras are bio-inspired dynamic vision sensors that are superior to frame-based cameras in terms of low power consumption, high dynamic range, and high temporal resolution in computer vision tasks. Recent advances in voxel-based representation learning have successfully exploited the sparsity of events with low computational complexity, but face challenges in extracting spatio-temporal features within voxels and representative global dependencies between voxels, thus limiting their representation power. In this work, towards a better trade-off between accuracy and computation overhead, we propose a novel voxel-based multi-scale transformer network (VMST-Net) to process event streams. Specifically, VMST-Net projects events within voxels into multi-channel frames along the time axis, such that 2D convolutions could be leveraged to encode spatio-temporal features in voxels. Then, VMST-Net utilizes a novel multi-scale multi-head self-attention (MSMHSA) mechanism with a multi-scale fusion (MSF) module that allows different heads within each layer to attend different scale 3D neighborhoods to adaptively aggregate the coarse-to-fine voxel features with little computational costs and parameters. Moreover, to model effective global features while saving computations, we aggregate features in a local-to-global manner by enlarging the coverage of 3D neighborhoods as the network gets deeper. Extensive experimental results on benchmark datasets demonstrate that our model advances state-of-the-art accuracy with low model complexity and computational complexity in all three visual tasks, including object classification, action recognition, and human pose estimation.

源语言英语
页(从-至)2112-2124
页数13
期刊IEEE Transactions on Circuits and Systems for Video Technology
34
4
DOI
出版状态已出版 - 1 4月 2024
已对外发布

学术指纹

探究 'Voxel-Based Multi-Scale Transformer Network for Event Stream Processing' 的科研主题。它们共同构成独一无二的指纹。

引用此