跳到主要导航 跳到搜索 跳到主要内容

RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition

  • Xiaoyu Yue
  • , Zhanghui Kuang
  • , Chenhao Lin
  • , Hongbin Sun
  • , Wayne Zhang
  • SenseTime Group Limited

科研成果: 书/报告/会议事项章节会议稿件同行评审

160 引用 (Scopus)

摘要

The attention-based encoder-decoder framework has recently achieved impressive results for scene text recognition, and many variants have emerged with improvements in recognition quality. However, it performs poorly on contextless texts (e.g., random character sequences) which is unacceptable in most of real application scenarios. In this paper, we first deeply investigate the decoding process of the decoder. We empirically find that a representative character-level sequence decoder utilizes not only context information but also positional information. Contextual information, which the existing approaches heavily rely on, causes the problem of attention drift. To suppress such side-effect, we propose a novel position enhancement branch, and dynamically fuse its outputs with those of the decoder attention module for scene text recognition. Specifically, it contains a position aware module to enable the encoder to output feature vectors encoding their own spatial positions, and an attention module to estimate glimpses using the positional clue (i.e., the current decoding time step) only. The dynamic fusion is conducted for more robust feature via an element-wise gate mechanism. Theoretically, our proposed method, dubbed RobustScanner, decodes individual characters with dynamic ratio between context and positional clues, and utilizes more positional ones when the decoding sequences with scarce context, and thus is robust and practical. Empirically, it has achieved new state-of-the-art results on popular regular and irregular text recognition benchmarks while without much performance drop on contextless benchmarks, validating its robustness in both contextual and contextless application scenarios.

源语言英语
主期刊名Computer Vision – ECCV 2020 - 16th European Conference, 2020, Proceedings
编辑Andrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
出版商Springer Science and Business Media Deutschland GmbH
135-151
页数17
ISBN(印刷版)9783030585280
DOI
出版状态已出版 - 2020
活动16th European Conference on Computer Vision, ECCV 2020 - Glasgow, 英国
期限: 23 8月 202028 8月 2020

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12364 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议16th European Conference on Computer Vision, ECCV 2020
国家/地区英国
Glasgow
时期23/08/2028/08/20

学术指纹

探究 'RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此