TY - JOUR
T1 - CSCC
T2 - Cross-Scene Crowd Counting via Learning to Diversify for Domain Generalization
AU - Chen, Yuehai
AU - Wang, Qingzhong
AU - Yang, Jing
AU - Chen, Badong
AU - Xiong, Haoyi
AU - Du, Shaoyi
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - It is challenging for crowd counting models to generalize to new scenes due to domain shifts in training and test data. Although domain adaptation approaches have made notable progress in bridging the domain gap, they require target domain data. In this paper, we propose a novel framework for cross-scene crowd counting, which unifies domain generalization and adaptation. For domain generalization, we train a model only using single-domain data and the model can be generalized to any scene with satisfying performance. Regarding domain adaptation, we use both source and target domain data to further improve the performance. We first design a generation network that diversifies the generated samples to cover the unseen target domains as much as possible by minimizing mutual information. This approach simulates training data in various domains, thereby enhancing the model's generalization ability. Then we develop a pixel-wise supervised contrastive loss function that pulls the human heads in the source images and generated images closer to each other and pushes them further away from the background. This loss helps extract a domain-invariant feature representation, thus improving the model's generalization ability. Moreover, if information about the target domain is available, our generalization method can be easily applied as an adaptation method by replacing the mutual information minimization loss with the mutual information maximization loss. This can further improve cross-scene crowd counting performance. The experimental results demonstrate the strong generalizability of our method across different datasets.
AB - It is challenging for crowd counting models to generalize to new scenes due to domain shifts in training and test data. Although domain adaptation approaches have made notable progress in bridging the domain gap, they require target domain data. In this paper, we propose a novel framework for cross-scene crowd counting, which unifies domain generalization and adaptation. For domain generalization, we train a model only using single-domain data and the model can be generalized to any scene with satisfying performance. Regarding domain adaptation, we use both source and target domain data to further improve the performance. We first design a generation network that diversifies the generated samples to cover the unseen target domains as much as possible by minimizing mutual information. This approach simulates training data in various domains, thereby enhancing the model's generalization ability. Then we develop a pixel-wise supervised contrastive loss function that pulls the human heads in the source images and generated images closer to each other and pushes them further away from the background. This loss helps extract a domain-invariant feature representation, thus improving the model's generalization ability. Moreover, if information about the target domain is available, our generalization method can be easily applied as an adaptation method by replacing the mutual information minimization loss with the mutual information maximization loss. This can further improve cross-scene crowd counting performance. The experimental results demonstrate the strong generalizability of our method across different datasets.
KW - Cross-sence crowd counting
KW - domain adaptation
KW - domain generalization
UR - https://www.scopus.com/pages/publications/85216899269
U2 - 10.1109/TMM.2025.3535302
DO - 10.1109/TMM.2025.3535302
M3 - 文章
AN - SCOPUS:85216899269
SN - 1520-9210
VL - 27
SP - 3320
EP - 3330
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -