A Neural Corpus Indexer for Document Retrieval

  • Yujing Wang
  • , Yingyan Hou
  • , Haonan Wang
  • , Ziming Miao
  • , Shibin Wu
  • , Hao Sun
  • , Qi Chen
  • , Yuqing Xia
  • , Chengmin Chi
  • , Guoshuai Zhao
  • , Zheng Liu
  • , Xing Xie
  • , Hao Allen Sun
  • , Weiwei Deng
  • , Qi Zhang
  • , Mao Yang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

111 Scopus citations

Abstract

Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. In this paper, we aim to show that an end-to-end deep neural network unifying training and indexing stages can significantly improve the recall performance of traditional methods. To this end, we propose Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates relevant document identifiers directly for a designated query. To optimize the recall performance of NCI, we invent a prefix-aware weight-adaptive decoder architecture, and leverage tailored techniques including query generation, semantic document identifiers, and consistency-based regularization. Empirical studies demonstrated the superiority of NCI on two commonly used academic benchmarks, achieving +21.4% and +16.8% relative enhancement for Recall@1 on NQ320k dataset and R-Precision on TriviaQA dataset, respectively, compared to the best baseline method.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
EditorsS. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713871088
StatePublished - 2022
Externally publishedYes
Event36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States
Duration: 28 Nov 20229 Dec 2022

Publication series

NameAdvances in Neural Information Processing Systems
Volume35
ISSN (Print)1049-5258

Conference

Conference36th Conference on Neural Information Processing Systems, NeurIPS 2022
Country/TerritoryUnited States
CityNew Orleans
Period28/11/229/12/22

Fingerprint

Dive into the research topics of 'A Neural Corpus Indexer for Document Retrieval'. Together they form a unique fingerprint.

Cite this