跳到主要导航 跳到搜索 跳到主要内容

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

  • Fangzhi Xu
  • , Hang Yan
  • , Chang Ma
  • , Haiteng Zhao
  • , Qiushi Sun
  • , Kanzhi Cheng
  • , Junxian He
  • , Jun Liu
  • , Zhiyong Wu
  • Xi'an Jiaotong University
  • Shanghai Artificial Intelligence Laboratory
  • Ministry of Education Key Laboratory of Intelligent Networks and Network Security
  • Shaanxi Province Key Laboratory of Big Data Knowledge Engineering
  • The University of Hong Kong
  • Peking University
  • Hong Kong University of Science and Technology

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Advancing LLM reasoning skills has captivated wide interest. However, current post-training techniques rely heavily on supervisory signals, such as outcome supervision or auxiliary reward models, which face the problem of scalability and high annotation costs. This motivates us to enhance LLM reasoning without the need for external supervision. We introduce a generalizable and purely unsupervised self-training framework, named Genius. Without external auxiliary, Genius requires to seek the optimal response sequence in a stepwise manner and optimize the LLM. To explore the potential steps and exploit the optimal ones, Genius introduces a stepwise foresight re-sampling strategy to sample and estimate the step value by simulating future outcomes. Further, we recognize that the unsupervised setting inevitably induces the intrinsic noise and uncertainty. To provide a robust optimization, we propose an advantage-calibrated optimization (ACO) loss function to mitigate estimation inconsistencies. Combining these techniques together, Genius provides an advanced initial step towards self-improve LLM reasoning with general queries and without supervision, revolutionizing reasoning scaling laws given the vast availability of general queries. The code will be released at https://github.com/xufangzhi/Genius.

源语言英语
主期刊名Long Papers
编辑Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
出版商Association for Computational Linguistics (ACL)
13153-13167
页数15
ISBN(电子版)9798891762510
出版状态已出版 - 2025
活动63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, 奥地利
期限: 27 7月 20251 8月 2025

出版系列

姓名Proceedings of the Annual Meeting of the Association for Computational Linguistics
1
ISSN(印刷版)0736-587X

会议

会议63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
国家/地区奥地利
Vienna
时期27/07/251/08/25

学术指纹

探究 'Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning' 的科研主题。它们共同构成独一无二的指纹。

引用此