TY - GEN
T1 - Leveraging multi-modal prior knowledge for large-scale concept learning in noisy web data
AU - Liang, Junwei
AU - Jiang, Lu
AU - Meng, Deyu
AU - Hauptmann, Alexander
N1 - Publisher Copyright:
© 2017 ACM.
PY - 2017/6/6
Y1 - 2017/6/6
N2 - Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community. A considerable amount of videos on the web is associated with rich but noisy contextual information, such as the title and other multi-modal information, which provides weak annotations or labels about the video content. To tackle the problem of large-scale noisy learning,We propose a novel method called Multimodal WEbly-Labeled Learning (WELL-MM), which is established on the state-of-the-art machine learning algorithm inspired by the learning process of human. WELL-MM introduces a novel multimodal approach to incorporate meaningful prior knowledge called curriculum from the noisy web videos. We empirically study the curriculum constructed from the multi-modal features of the Internet videos and images. The comprehensive experimental results on FCVID and YFCC100M demonstrate that WELL-MM outperforms state-of-the-art studies by a statically significant margin on learning concepts from noisy web video data. In addition, the results also verify that WELL-MM is robust to the level of noisiness in the video data. Notably, WELL-MM trained on sufficient noisy web labels is able to achieve a be.er accuracy to supervised learning methods trained on the clean manually labeled data.
AB - Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community. A considerable amount of videos on the web is associated with rich but noisy contextual information, such as the title and other multi-modal information, which provides weak annotations or labels about the video content. To tackle the problem of large-scale noisy learning,We propose a novel method called Multimodal WEbly-Labeled Learning (WELL-MM), which is established on the state-of-the-art machine learning algorithm inspired by the learning process of human. WELL-MM introduces a novel multimodal approach to incorporate meaningful prior knowledge called curriculum from the noisy web videos. We empirically study the curriculum constructed from the multi-modal features of the Internet videos and images. The comprehensive experimental results on FCVID and YFCC100M demonstrate that WELL-MM outperforms state-of-the-art studies by a statically significant margin on learning concepts from noisy web video data. In addition, the results also verify that WELL-MM is robust to the level of noisiness in the video data. Notably, WELL-MM trained on sufficient noisy web labels is able to achieve a be.er accuracy to supervised learning methods trained on the clean manually labeled data.
KW - Big data
KW - Concept detection
KW - Noisy data
KW - Prior knowledge
KW - Video understanding
KW - Web label
KW - Weblysupervised learning
UR - https://www.scopus.com/pages/publications/85021837074
U2 - 10.1145/3078971.3079003
DO - 10.1145/3078971.3079003
M3 - 会议稿件
AN - SCOPUS:85021837074
T3 - ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval
SP - 32
EP - 40
BT - ICMR 2017 - Proceedings of the 2017 ACM International Conference on Multimedia Retrieval
PB - Association for Computing Machinery, Inc
T2 - 17th ACM International Conference on Multimedia Retrieval, ICMR 2017
Y2 - 6 June 2017 through 9 June 2017
ER -