TY - JOUR
T1 - Multimodal Information Fusion Approach for Noncontact Heart Rate Estimation Using Facial Videos and Graph Convolutional Network
AU - Yue, Zijie
AU - Ding, Shuai
AU - Yang, Shanlin
AU - Wang, Linjie
AU - Li, Yinghui
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2022
Y1 - 2022
N2 - Heart rate (HR) is a critical signal for reflecting human physical and mental conditions, and it is beneficial for diagnosing neurological and cardiovascular diseases due to its excellent accessibility. However, traditional HR measurement devices have limited usability and convenience. Recent studies have shown that the optical absorption variation of human skin due to blood volume variation in cardiac cycles can be acquired from facial videos and used to estimate HR in a noncontact manner. However, the advanced noncontact HR estimation approaches are based on a single HR information source, resulting in unsatisfactory estimation results due to noise corruption and insufficient information. To address these problems, this article proposes a multimodal information fusion framework for noncontact HR estimation. First, feature representation maps are used to effectively extract periodic signals from facial visible-light and thermal infrared videos. Then, a temporal-information-aware HR feature extraction network (THR-Net) for encoding discriminative spatiotemporal information from the representation maps is presented. Finally, based on a graph convolution network (GCN), an information fusion model is proposed for feature integration and HR estimation. Experimental and evaluation results of five different metrics on two datasets show that the proposed approach outperforms the state-of-the-art approaches. This article demonstrates the advantage of multimodal information fusion for noncontact HR estimation.
AB - Heart rate (HR) is a critical signal for reflecting human physical and mental conditions, and it is beneficial for diagnosing neurological and cardiovascular diseases due to its excellent accessibility. However, traditional HR measurement devices have limited usability and convenience. Recent studies have shown that the optical absorption variation of human skin due to blood volume variation in cardiac cycles can be acquired from facial videos and used to estimate HR in a noncontact manner. However, the advanced noncontact HR estimation approaches are based on a single HR information source, resulting in unsatisfactory estimation results due to noise corruption and insufficient information. To address these problems, this article proposes a multimodal information fusion framework for noncontact HR estimation. First, feature representation maps are used to effectively extract periodic signals from facial visible-light and thermal infrared videos. Then, a temporal-information-aware HR feature extraction network (THR-Net) for encoding discriminative spatiotemporal information from the representation maps is presented. Finally, based on a graph convolution network (GCN), an information fusion model is proposed for feature integration and HR estimation. Experimental and evaluation results of five different metrics on two datasets show that the proposed approach outperforms the state-of-the-art approaches. This article demonstrates the advantage of multimodal information fusion for noncontact HR estimation.
KW - Attention mechanism
KW - Deep learning
KW - Graph convolution network (GCN)
KW - Multimodal information fusion
KW - Noncontact heart rate (HR) estimation
UR - https://www.scopus.com/pages/publications/85120074707
U2 - 10.1109/TIM.2021.3129498
DO - 10.1109/TIM.2021.3129498
M3 - 文章
AN - SCOPUS:85120074707
SN - 0018-9456
VL - 71
JO - IEEE Transactions on Instrumentation and Measurement
JF - IEEE Transactions on Instrumentation and Measurement
ER -