跳到主要导航 跳到搜索 跳到主要内容

DNA: Denoised Neighborhood Aggregation for Fine-grained Category Discovery

  • Xi'an Jiaotong University
  • Lenovo Research
  • University of Massachusetts Boston

科研成果: 书/报告/会议事项章节会议稿件同行评审

6 引用 (Scopus)

摘要

Discovering fine-grained categories from coarsely labeled data is a practical and challenging task, which can bridge the gap between the demand for fine-grained analysis and the high annotation cost. Previous works mainly focus on instance-level discrimination to learn low-level features, but ignore semantic similarities between data, which may prevent these models learning compact cluster representations. In this paper, we propose Denoised Neighborhood Aggregation (DNA), a self-supervised framework that encodes semantic structures of data into the embedding space. Specifically, we retrieve k-nearest neighbors of a query as its positive keys to capture semantic similarities between data and then aggregate information from the neighbors to learn compact cluster representations, which can make fine-grained categories more separatable. However, the retrieved neighbors can be noisy and contain many false-positive keys, which can degrade the quality of learned embeddings. To cope with this challenge, we propose three principles to filter out these false neighbors for better representation learning. Furthermore, we theoretically justify that the learning objective of our framework is equivalent to a clustering loss, which can capture semantic similarities between data to form compact fine-grained clusters. Extensive experiments on three benchmark datasets show that our method can retrieve more accurate neighbors (21.31% accuracy improvement) and outperform state-of-the-art models by a large margin (average 9.96% improvement on three metrics). Our code and data are available at https://github.com/Lackel/DNA.

源语言英语
主期刊名EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
编辑Houda Bouamor, Juan Pino, Kalika Bali
出版商Association for Computational Linguistics (ACL)
12292-12302
页数11
ISBN(电子版)9798891760608
DOI
出版状态已出版 - 2023
活动2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 - Hybrid, Singapore, 新加坡
期限: 6 12月 202310 12月 2023

出版系列

姓名EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings

会议

会议2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
国家/地区新加坡
Hybrid, Singapore
时期6/12/2310/12/23

学术指纹

探究 'DNA: Denoised Neighborhood Aggregation for Fine-grained Category Discovery' 的科研主题。它们共同构成独一无二的指纹。

引用此