Skip to main navigation Skip to search Skip to main content

A Deep Multi-Modal CNN for Multi-Instance Multi-Label Image Classification

  • Lingyun Song
  • , Jun Liu
  • , Buyue Qian
  • , Mingxuan Sun
  • , Kuan Yang
  • , Meng Sun
  • , Samar Abbas
  • Xi'an Jiaotong University
  • University of California at Santa Barbara
  • Louisiana State University

Research output: Contribution to journalArticlepeer-review

110 Scopus citations

Abstract

Deep convolutional neural networks (CNNs) have shown superior performance on the task of single-label image classification. However, the applicability of CNNs to multi-label images still remains an open problem, mainly because of two reasons. First, each image is usually treated as an inseparable entity and represented as one instance, which mixes the visual information corresponding to different labels. Second, the correlations amongst labels are often overlooked. To address these limitations, we propose a deep multi-modal CNN for multi-instance multi-label image classification, called MMCNN-MIML. By combining CNNs with multi-instance multi-label (MIML) learning, our model represents each image as a bag of instances for image classification and inherits the merits of both CNNs and MIML. In particular, MMCNN-MIML has three main appealing properties: 1) it can automatically generate instance representations for MIML by exploiting the architecture of CNNs; 2) it takes advantage of the label correlations by grouping labels in its later layers; and 3) it incorporates the textual context of label groups to generate multi-modal instances, which are effective in discriminating visually similar objects belonging to different groups. Empirical studies on several benchmark multi-label image data sets show that MMCNN-MIML significantly outperforms the state-of-the-art baselines on multi-label image classification tasks.

Original languageEnglish
Article number8432496
Pages (from-to)6025-6038
Number of pages14
JournalIEEE Transactions on Image Processing
Volume27
Issue number12
DOIs
StatePublished - Dec 2018

Keywords

  • CNN
  • MIML
  • context information
  • label correlations
  • multi-label image classification

Fingerprint

Dive into the research topics of 'A Deep Multi-Modal CNN for Multi-Instance Multi-Label Image Classification'. Together they form a unique fingerprint.

Cite this