TY - JOUR
T1 - GSLTA-CDFSAR
T2 - Global Sequences and Local Tuples Alignment for Cross-Domain Few-Shot Action Recognition
AU - Guo, Fei
AU - Qi, Han
AU - Zhang, Xuetao
AU - Zhu, Li
AU - Sun, Jing
N1 - Publisher Copyright:
© 2025
PY - 2025/2/28
Y1 - 2025/2/28
N2 - Few-shot action recognition (FSAR) has made substantial progress, however it primarily addresses problems within a single domain. Its effectiveness is often questioned when applied across different domains. This is mainly due to inductive biases in data distribution during the meta-training, including spatial and temporal distribution biases. These combined biases further complicate the adaptation issue in videos, making it challenging for models trained in one domain to adapt to another. In order to deal with this problem, we first enhance the source domain videos with frames from unlabeled target domain videos. Then, we employ a dual-branch structure to process the videos. The first branch, named the Domain Temporal branch, simultaneously handles global sequences of videos from both the source and target domains, while the second branch, named the Local-Global Adapter branch, compares local tuples of videos with global sequences from the source domain. We align the meta-learning results of the source domain from the first branch with that from the second branch, enabling us to obtain domain-invariant information solely from the source domain. Concurrently, in the first branch, we perform a reconstruction operation for the target domain videos, allowing the model to extract features that approach the target domain. Our code is available on: https://github.com/cofly2014/GSLTA.git.
AB - Few-shot action recognition (FSAR) has made substantial progress, however it primarily addresses problems within a single domain. Its effectiveness is often questioned when applied across different domains. This is mainly due to inductive biases in data distribution during the meta-training, including spatial and temporal distribution biases. These combined biases further complicate the adaptation issue in videos, making it challenging for models trained in one domain to adapt to another. In order to deal with this problem, we first enhance the source domain videos with frames from unlabeled target domain videos. Then, we employ a dual-branch structure to process the videos. The first branch, named the Domain Temporal branch, simultaneously handles global sequences of videos from both the source and target domains, while the second branch, named the Local-Global Adapter branch, compares local tuples of videos with global sequences from the source domain. We align the meta-learning results of the source domain from the first branch with that from the second branch, enabling us to obtain domain-invariant information solely from the source domain. Concurrently, in the first branch, we perform a reconstruction operation for the target domain videos, allowing the model to extract features that approach the target domain. Our code is available on: https://github.com/cofly2014/GSLTA.git.
KW - Cross-domain
KW - Few-shot action recognition
KW - Multiple-level distillation
UR - https://www.scopus.com/pages/publications/85215930424
U2 - 10.1016/j.knosys.2025.113041
DO - 10.1016/j.knosys.2025.113041
M3 - 文章
AN - SCOPUS:85215930424
SN - 0950-7051
VL - 311
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 113041
ER -