摘要
Deep learning has been successfully applied to multimodal representation learning. Similar with single modal deep learning method, such multimodal deep learning methods consist of a greedy layer-wise feedforward propagation and a backpropagation (BP) fine-tune conducted by diverse targets. These models have the drawback of time consuming. While, extreme learning machine (ELM) is a fast learning algorithm for single hidden layer feedforward neural network. And previous works has shown the effectiveness of ELM based hierarchical framework for multilayer perceptron. In this paper, we introduce an ELM based hierarchical framework for multimodal data. The proposed architecture consists of three main components: (1) self-taught feature extraction for specific modality by an ELM-based sparse autoencoder, (2) fused representation learning based on the features learned by previous step and (3) supervised feature classification based on the fused representation. This is an exact feedforward framework that once a layer is established, its weights are fixed without fine-tuning. Therefore, it has much better learning efficiency than the gradient based multimodal deep learning methods. We conduct experiments on MNIST, XRMB and NUS datasets, the proposed algorithm obtains faster convergence and achieves better classification performance compared with the other existing multimodal deep learning models.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 141-150 |
| 页数 | 10 |
| 期刊 | Neurocomputing |
| 卷 | 322 |
| DOI | |
| 出版状态 | 已出版 - 17 12月 2018 |
学术指纹
探究 'An effective hierarchical extreme learning machine based multimodal fusion framework' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver