TY - JOUR
T1 - XRF V2
T2 - A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses
AU - Lan, Bo
AU - Li, Pei
AU - Yin, Jiaxi
AU - Song, Yunpeng
AU - Wang, Ge
AU - Ding, Han
AU - Han, Jinsong
AU - Wang, Fei
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/9/3
Y1 - 2025/9/3
N2 - Human Action Recognition (HAR) plays a crucial role in applications such as health monitoring, smart home automation, and human-computer interaction. While HAR has been extensively studied, action summarization using Wi-Fi and IMU signals in smart-home environments, which involves identifying and summarizing continuous actions, remains an emerging task. This paper introduces the novel XRF V2 dataset, designed for indoor daily activity Temporal Action Localization (TAL) and Action Summarization. XRF V2 integrates multimodal data from Wi-Fi signals, IMU sensors (smartphones, smartwatches, earbuds, and smart glasses), and synchronized video recordings, offering a diverse collection of indoor activities from 16 volunteers across three environments. To tackle TAL and action summarization, we propose the XRFMamba neural network, which captures long-term dependencies in untrimmed sensory sequences and achieves the best performance with an average mAP of 78.74, outperforming the recent WiFiTAD by 5.49 points in mAP@avg while using 35% fewer parameters. In action summarization, we introduce a new metric, Response Meaning Consistency (RMC), to evaluate performance. It achieves an average RMC (mRMC) of 0.802. We envision XRF V2 as a valuable resource for advancing research in human action localization, action forecasting, pose estimation, multimodal foundation model pre-training, synthetic data generation, and more. The data and code are available at https://github.com/aiotgroup/XRFV2.
AB - Human Action Recognition (HAR) plays a crucial role in applications such as health monitoring, smart home automation, and human-computer interaction. While HAR has been extensively studied, action summarization using Wi-Fi and IMU signals in smart-home environments, which involves identifying and summarizing continuous actions, remains an emerging task. This paper introduces the novel XRF V2 dataset, designed for indoor daily activity Temporal Action Localization (TAL) and Action Summarization. XRF V2 integrates multimodal data from Wi-Fi signals, IMU sensors (smartphones, smartwatches, earbuds, and smart glasses), and synchronized video recordings, offering a diverse collection of indoor activities from 16 volunteers across three environments. To tackle TAL and action summarization, we propose the XRFMamba neural network, which captures long-term dependencies in untrimmed sensory sequences and achieves the best performance with an average mAP of 78.74, outperforming the recent WiFiTAD by 5.49 points in mAP@avg while using 35% fewer parameters. In action summarization, we introduce a new metric, Response Meaning Consistency (RMC), to evaluate performance. It achieves an average RMC (mRMC) of 0.802. We envision XRF V2 as a valuable resource for advancing research in human action localization, action forecasting, pose estimation, multimodal foundation model pre-training, synthetic data generation, and more. The data and code are available at https://github.com/aiotgroup/XRFV2.
UR - https://www.scopus.com/pages/publications/105015368382
U2 - 10.1145/3749521
DO - 10.1145/3749521
M3 - 文章
AN - SCOPUS:105015368382
SN - 2474-9567
VL - 9
JO - Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
JF - Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
IS - 3
ER -