Inferring Shared Attention in Social Scene Videos

  • Lifeng Fan
  • , Yixin Chen
  • , Ping Wei
  • , Wenguan Wang
  • , Song Chun Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

77 Scopus citations

Abstract

This paper addresses a new problem of inferring shared attention in third-person social scene videos. Shared attention is a phenomenon that two or more individuals simultaneously look at a common target in social scenes. Perceiving and identifying shared attention in videos plays crucial roles in social activities and social scene understanding. We propose a spatial-temporal neural network to detect shared attention intervals in videos and predict shared attention locations in frames. In each video frame, human gaze directions and potential target boxes are two key features for spatially detecting shared attention in the social scene. In temporal domain, a convolutional Long Short-Term Memory network utilizes the temporal continuity and transition constraints to optimize the predicted shared attention heatmap. We collect a new dataset VideoCoAtt1 from public TV show videos, containing 380 complex video sequences with more than 492,000 frames that include diverse social scenes for shared attention study. Experiments on this dataset show that our model can effectively infer shared attention in videos. We also empirically verify the effectiveness of different components in our model.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
PublisherIEEE Computer Society
Pages6460-6468
Number of pages9
ISBN (Electronic)9781538664209
DOIs
StatePublished - 14 Dec 2018
Event31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 - Salt Lake City, United States
Duration: 18 Jun 201822 Jun 2018

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Conference

Conference31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
Country/TerritoryUnited States
CitySalt Lake City
Period18/06/1822/06/18

Fingerprint

Dive into the research topics of 'Inferring Shared Attention in Social Scene Videos'. Together they form a unique fingerprint.

Cite this