Skip to main navigation Skip to search Skip to main content

VaF-LangSplat: Voxel-Aware Fusion Language Gaussian Splatting

  • Xi'an Jiaotong University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Efficient and precise open-vocabulary 3D scene segmentation remains a critical challenge in computer vision. While current leading methods encode CLIP language features into 3D Gaussians to achieve high segmentation accuracy and fast inference speeds, they suffer from point ambiguity issues caused by separately training on multi-level 2D semantic masks. This approach not only compromises time and space efficiency but also degrades accuracy when selecting optimal semantic levels. To overcome these limitations, we propose Voxel-Aware Fusion Language Gaussian Splatting (VaF-LangSplat), a novel framework that jointly optimizes geometric and semantic representations. Our approach first voxelizes 3D Gaussians using sparse point clouds and lightweight MLP decoders, effectively disentangling language features from geometric attributes. This enables simultaneous training across arbitrary semantic levels with minimal overhead. Crucially, we introduce Fusion Language Splatting, which aligns geometric and multi-level semantic distributions to sharpen boundary definitions while eliminating redundant Gaussian expansions. The voxel-aware representation further enhances robustness against motion blur and lighting variations. Experiments on open-vocabulary 3D localization and segmentation tasks demonstrate that VaF-LangSplat outperforms LangSplat (the prior state-of-the-art) with significant improvements in both segmentation/localization accuracy and efficiency: 4X faster training and 15X reduced storage requirements.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages4952-4961
Number of pages10
ISBN (Electronic)9798400720352
DOIs
StatePublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • 3d gaussians
  • fusion language splatting
  • open-vocabulary segmentation
  • point ambiguity issue
  • voxel-aware

Fingerprint

Dive into the research topics of 'VaF-LangSplat: Voxel-Aware Fusion Language Gaussian Splatting'. Together they form a unique fingerprint.

Cite this