跳到主要导航 跳到搜索 跳到主要内容

Improving OCR performance in biomedical literature retrieval through preprocessing and postprocessing

  • Songhua Xu
  • , James McCusker
  • , Martin Schultz
  • , Michael Krauthammer
  • Yale University

科研成果: 会议稿件论文同行评审

3 引用 (Scopus)

摘要

Today's information retrieval (IR) techniques are mostly text-based. As a consequence, some types of information are beyond the reach of text-based IR systems, which fail in situations where textual information can not be easily accessed, e.g. textual information in biomedical images and figures. To tackle such situations, we propose to augment IR systems with the ability to perform optical character recognition (OCR). A principal obstacle is the accuracy of the OCR procedure, which is often error-prone. In our work, we introduce some preprocessing and postprocessing techniques for improving the OCR performance. Our preprocessing stage is concerned with separating texts from graphical elements in an image so that the graphics in the image would not affect the performance of OCR, as today's OCR engines are optimized for dealing with documents without graphical elements. Our postprocessing stage is concerned with a context-based OCR result correction. Experimental results show that these preprocessing and postprocessing techniques can consistently improve the performance of biomedical image OCR in terms of either precision or recall.

源语言英语
161-164
页数4
出版状态已出版 - 2008
已对外发布
活动3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Turku, 芬兰
期限: 1 9月 20083 9月 2008

会议

会议3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008
国家/地区芬兰
Turku
时期1/09/083/09/08

学术指纹

探究 'Improving OCR performance in biomedical literature retrieval through preprocessing and postprocessing' 的科研主题。它们共同构成独一无二的指纹。

引用此