TY - JOUR
T1 - Deep semisupervised zero-shot learning with maximum mean discrepancy
AU - Zhang, Lingling
AU - Liu, Jun
AU - Luo, Minnan
AU - Chang, Xiaojun
AU - Zheng, Qinghua
N1 - Publisher Copyright:
© 2018 Massachusetts Institute of Technology.
PY - 2018/5/1
Y1 - 2018/5/1
N2 - Due to the difficulty of collecting labeled images for hundreds of thousands of visual categories, zero-shot learning,where unseen categories do not have any labeled images in training stage, has attracted more attention. In the past, many studies focused on transferring knowledge from seen to unseen categories by projecting all category labels into a semantic space. However, the label embeddings could not adequately express the semantics of categories. Furthermore, the common semantics of seen and unseen instances cannot be captured accurately because the distribution of these instances may be quite different. For these issues, we propose a novel deep semisupervised method by jointly considering the heterogeneity gap between different modalities and the correlation among unimodal instances. This method replaces the original labels with the corresponding textual descriptions to better capture the category semantics. This method also overcomes the problem of distribution difference by minimizing the maximum mean discrepancy between seen and unseen instance distributions. Extensive experimental results on two benchmark data sets, CU200-Birds and Oxford Flowers-102, indicate that our method achieves significant improvements over previous methods.
AB - Due to the difficulty of collecting labeled images for hundreds of thousands of visual categories, zero-shot learning,where unseen categories do not have any labeled images in training stage, has attracted more attention. In the past, many studies focused on transferring knowledge from seen to unseen categories by projecting all category labels into a semantic space. However, the label embeddings could not adequately express the semantics of categories. Furthermore, the common semantics of seen and unseen instances cannot be captured accurately because the distribution of these instances may be quite different. For these issues, we propose a novel deep semisupervised method by jointly considering the heterogeneity gap between different modalities and the correlation among unimodal instances. This method replaces the original labels with the corresponding textual descriptions to better capture the category semantics. This method also overcomes the problem of distribution difference by minimizing the maximum mean discrepancy between seen and unseen instance distributions. Extensive experimental results on two benchmark data sets, CU200-Birds and Oxford Flowers-102, indicate that our method achieves significant improvements over previous methods.
UR - https://www.scopus.com/pages/publications/85048677707
U2 - 10.1162/neco_a_01071
DO - 10.1162/neco_a_01071
M3 - 快报
C2 - 29566352
AN - SCOPUS:85048677707
SN - 0899-7667
VL - 30
SP - 1426
EP - 1447
JO - Neural Computation
JF - Neural Computation
IS - 5
ER -