跳到主要导航 跳到搜索 跳到主要内容

Local attention and contrastive clustering network for sign language recognition

  • Xi'an Jiaotong University

科研成果: 期刊稿件文章同行评审

摘要

Since the high content similarity in RGB videos used for sign language recognition, it is challenging to extract highly discriminative and orthogonal features. To address this, we propose a novel framework—Local Attention and Contrastive Clustering Network for Sign Language Recognition (LACC-SLR)—which enhances both global and fine-grained feature representation. Specifically, we introduce the Locality-Aware Attention MViT (LAA-MViT), which integrates a 3D Manhattan distance-based decay mechanism into attention computation, enabling the model to focus on spatiotemporally adjacent regions while maintaining global context. We also propose the Contrastive Label-Center Clustering (CLCC) module, which improves intra-class compactness and inter-class separability by aligning features with learnable class center vectors and applying label smoothing based on inter-class similarity. Furthermore, we adopt a Parallel Visual-Skeleton Framework (PVSF) that leverages both RGB videos and skeletal data, employing cross-modal attention for effective feature fusion. Extensive experiments on four benchmarks—WLASL, NMFs-CSL, AUTSL, and SLR500—demonstrate that our method consistently outperforms previous state-of-the-art approaches, achieving superior accuracy and generalization. Codes are available at https://github.com/Shuanglin-1126/LACC-SLR.

源语言英语
文章编号112941
期刊Pattern Recognition
173
DOI
出版状态已出版 - 5月 2026

学术指纹

探究 'Local attention and contrastive clustering network for sign language recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此