Inferring tasks and fluents in videos by learning causal relations

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Recognizing time-varying object states in complex tasks is an important and challenging issue. In this paper, we propose a novel model to jointly infer object fluents and complex tasks in videos. A task is a complex human activity with specific goals and a fluent is defined as a time-varying object state. A hierarchical graph represents a task as a human action stream and multiple concurrent object fluents which vary as the human performs the actions. In this process, the human actions serve as the causes of object state changes which conversely reflect the effects of human actions. For a given input video, a causal sampling search algorithm is proposed to jointly infer the task category and the states of objects in each video frame. For model learning, a structural SVM framework is adopted to jointly train the task, fluent, cause, and effect parameters. We test the proposed method on a task and fluent dataset. Experimental results demonstrate the effectiveness of the proposed method.

Original languageEnglish
Title of host publicationProceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages7566-7572
Number of pages7
ISBN (Electronic)9781728188089
DOIs
StatePublished - 2020
Event25th International Conference on Pattern Recognition, ICPR 2020 - Virtual, Milan, Italy
Duration: 10 Jan 202115 Jan 2021

Publication series

NameProceedings - International Conference on Pattern Recognition
ISSN (Print)1051-4651

Conference

Conference25th International Conference on Pattern Recognition, ICPR 2020
Country/TerritoryItaly
CityVirtual, Milan
Period10/01/2115/01/21

Fingerprint

Dive into the research topics of 'Inferring tasks and fluents in videos by learning causal relations'. Together they form a unique fingerprint.

Cite this