TY - JOUR
T1 - Multi-view cognition with path search for one-shot part labeling
AU - Wang, Shaowei
AU - Zhang, Lingling
AU - Qin, Tao
AU - Liu, Jun
AU - Li, Yifei
AU - Wang, Qianying
AU - Zheng, Qinghua
N1 - Publisher Copyright:
© 2024 Elsevier Inc.
PY - 2024/7
Y1 - 2024/7
N2 - The diagram is an abstract form of visual expression in the field of education, which is often used to express complex phenomena and convey logic relationships. In recent years, tasks such as diagram classification and textbook question answering have attracted attention and become a new benchmark for evaluating the complex reasoning ability of models. However, due to the lack of large corpora and the abstract and sparse visual expressions, it is difficult for research methods on natural images to achieve good results on diagrams. In order to solve the above challenges, the researchers consider using the one-shot setting for limited samples challenge and using part labeling to enhance the learning of relational structures. By definition, the one-shot part labeling task is to label multiple parts of an object in the query diagram given only a single support diagram of that category. Under this setting, we propose the Automated Search Multi-view Matching Network (Auto-MMN) which simulating human cognitive methods and process of set-to-set matching problem. We define three views operations based on the attention mechanism and multiplex graph, including the learning of global visual features (global–local view), the interaction between neighboring parts (local–local view), and the comparison of counterparts (cross-local view). We propose a novel learning path search technology to adaptively plan paths for the above three views, which can also increase the generalization performance of the model. We evaluate the Auto-MMN on three different datasets, that is, image-to-image, diagram-to-diagram, and image-to-diagram part labeling scenarios. Extensive experiments show that our model significantly outperforms other baselines on different scenarios and both the multi-view operations and the learning path search produce excellent results. We open source the core code in https://github.com/WayneWong97/Auto-MMN.
AB - The diagram is an abstract form of visual expression in the field of education, which is often used to express complex phenomena and convey logic relationships. In recent years, tasks such as diagram classification and textbook question answering have attracted attention and become a new benchmark for evaluating the complex reasoning ability of models. However, due to the lack of large corpora and the abstract and sparse visual expressions, it is difficult for research methods on natural images to achieve good results on diagrams. In order to solve the above challenges, the researchers consider using the one-shot setting for limited samples challenge and using part labeling to enhance the learning of relational structures. By definition, the one-shot part labeling task is to label multiple parts of an object in the query diagram given only a single support diagram of that category. Under this setting, we propose the Automated Search Multi-view Matching Network (Auto-MMN) which simulating human cognitive methods and process of set-to-set matching problem. We define three views operations based on the attention mechanism and multiplex graph, including the learning of global visual features (global–local view), the interaction between neighboring parts (local–local view), and the comparison of counterparts (cross-local view). We propose a novel learning path search technology to adaptively plan paths for the above three views, which can also increase the generalization performance of the model. We evaluate the Auto-MMN on three different datasets, that is, image-to-image, diagram-to-diagram, and image-to-diagram part labeling scenarios. Extensive experiments show that our model significantly outperforms other baselines on different scenarios and both the multi-view operations and the learning path search produce excellent results. We open source the core code in https://github.com/WayneWong97/Auto-MMN.
KW - Diagram understanding
KW - One-shot learning
KW - Part labeling
UR - https://www.scopus.com/pages/publications/85190831046
U2 - 10.1016/j.cviu.2024.104015
DO - 10.1016/j.cviu.2024.104015
M3 - 文章
AN - SCOPUS:85190831046
SN - 1077-3142
VL - 244
JO - Computer Vision and Image Understanding
JF - Computer Vision and Image Understanding
M1 - 104015
ER -