Skip to main navigation Skip to search Skip to main content

Trusted 3D self-supervised representation learning with cross-modal settings

  • Xu Han
  • , Haozhe Cheng
  • , Pengcheng Shi
  • , Jihua Zhu
  • Xi'an Jiaotong University

Research output: Contribution to journalArticlepeer-review

Abstract

Cross-modal setting employing 2D images and 3D point clouds in self-supervised representation learning is proven to be an effective way to enhance visual perception capabilities. However, different modalities have different data formats and representations. Directly using features extracted from cross-modal datasets may lead to information conflicting and collapsing. We refer to this problem as uncertainty in network learning. Therefore, reducing uncertainty to obtain trusted descriptions has become the key to improving network performance. Motivated by this, we propose our trusted cross-modal network in self-supervised learning (TCMSS). It can obtain trusted descriptions by a trusted combination module as well as improve network performance with a well-designed loss function. In the trusted combination module, we utilize the Dirichlet distribution and the subjective logic to parameterize the features and acquire probabilistic uncertainty at the same. Then, the Dempster-Shafer Theory (DST) is used to obtain trusted descriptions by weighting uncertainty to the parameterized results. We have also designed our trusted domain loss function, including domain loss and trusted loss. It can effectively improve the prediction accuracy of the network by applying contrastive learning between different feature descriptions. The experimental results show that our model outperforms previous results on linear classification in ScanObjectNN as well as few-shot classification in both ModelNet40 and ScanObjectNN. In addition, part segmentation also reports a superior result to previous methods in ShapeNet. Further, the ablation studies validate the potency of our method for a better point cloud understanding.

Original languageEnglish
Article number77
JournalMachine Vision and Applications
Volume35
Issue number4
DOIs
StatePublished - Jul 2024

Keywords

  • Contrastive learning
  • Cross-modal learning
  • Point clouds
  • Self-supervised Representation learning
  • Uncertainty

Fingerprint

Dive into the research topics of 'Trusted 3D self-supervised representation learning with cross-modal settings'. Together they form a unique fingerprint.

Cite this