A Novel Degraded Document Binarization Model through Vision Transformer Network

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

Degraded document binarization has received keen attention due to its vital influence on subsequent document analysis tasks. In this study, we propose a novel Degraded Document Binarization model through the vision transFormer framework, termed D2BFormer. Thanks to its end-to-end trainable fashion, the D2BFormer model is able to autonomously optimize its parameterized configuration of the entire learning pipeline without incurring the intensity-to-binary value conversion phase, resulting in an improved binarization quality. In addition, we propose a novel dual-branched encoding feature fusion module, which combines architectural components from the vision transformer framework and deep convolutional neural networks. The resulting encoding module can extract features from an input document that are sensitive to both global and local characteristics. Meanwhile, the proposed encoding feature extraction module can operate internally at a much lower spatial resolution than that of a raw input document, leading to reduced computational complexity. Furthermore, we propose a novel progressively merged decoding feature fusion module through carefully introduced skip connections both inside and outside the decoding network. The resulting decoding module progressively combines counterpart features derived from the corresponding layers of the encoding network with comparable spatial resolutions and up-sampled features generated from previous layers in the decoding network. Finally, the experiments conducted on ten public datasets demonstrate that the proposed D2BFormer model gains promising performance in terms of four metrics.

Original languageEnglish
Pages (from-to)159-173
Number of pages15
JournalInformation Fusion
Volume93
DOIs
StatePublished - May 2023
Externally publishedYes

Keywords

  • Convolutional neural network
  • Degraded document binarization
  • Feature fusion
  • Vision transformer network

Fingerprint

Dive into the research topics of 'A Novel Degraded Document Binarization Model through Vision Transformer Network'. Together they form a unique fingerprint.

Cite this