跳到主要导航 跳到搜索 跳到主要内容

Text Grouping Adapter: Adapting Pre-Trained Text Detector for Layout Analysis

  • Tianci Bi
  • , Xiaoyi Zhang
  • , Zhizheng Zhang
  • , Wenxuan Xie
  • , Cuiling Lan
  • , Yan Lu
  • , Nanning Zheng
  • Xi'an Jiaotong University
  • Microsoft USA

科研成果: 书/报告/会议事项章节会议稿件同行评审

5 引用 (Scopus)

摘要

Significant progress has been made in scene text detection models since the rise of deep learning, but scene text layout analysis, which aims to group detected text instances as paragraphs, has not kept pace. Previous works either treated text detection and grouping using separate models, or train a model from scratch while using a unified one. All of them have not yet made full use of the already well-trained text detectors and easily obtainable detection datasets. In this paper, we present Text Grouping Adapter (TGA), a module that can enable the utilization of various pretrained text detectors to learn layout analysis, allowing us to adopt a well-trained text detector right off the shelf or just fine-tune it efficiently. Designed to be compatible with various text detector architectures, TGA takes detected text regions and image features as universal inputs to as-semble text instance features. To capture broader contextual information for layout analysis, we propose to predict text group masks from text instance features by one-to-many assignment. Our comprehensive experiments demonstrate that, even with frozen pretrained models, incorporating our TGA into various pretrained text detectors and text spotters can achieve superior layout analysis performance, simultaneously inheriting generalized text detection ability from pretraining. In the case of full parameter fine-tuning, we can further improve layout analysis performance.

源语言英语
主期刊名Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
出版商IEEE Computer Society
28150-28159
页数10
ISBN(电子版)9798350353006
ISBN(印刷版)9798350353006
DOI
出版状态已出版 - 2024
活动2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, 美国
期限: 16 6月 202422 6月 2024

出版系列

姓名Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN(印刷版)1063-6919

会议

会议2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
国家/地区美国
Seattle
时期16/06/2422/06/24

学术指纹

探究 'Text Grouping Adapter: Adapting Pre-Trained Text Detector for Layout Analysis' 的科研主题。它们共同构成独一无二的指纹。

引用此