TY - JOUR
T1 - FMW-Net
T2 - a first-order meta-weight-net approach for sample weighting
AU - Zhou, Yubo
AU - Shu, Jun
AU - Liu, Junmin
AU - Meng, Deyu
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025.
PY - 2025/10
Y1 - 2025/10
N2 - Deep neural networks (DNNs) have achieved impressive performance in various applications, but are susceptible to overfitting biases in training data, such as label noise and class imbalance. Example reweighting methods can be used to solve this issue, while often require manually specifying the weighting function forms. Recently, Meta-Weight-Net (MW-Net) method has been proposed to automatically learn the weighting function parameterized by a Multi-Layer Perceptron (MLP) in a meta-learning manner. However, the update of MW-Net suffers from expensive computations due to the second-order gradient computation in bilevel optimization. To address this issue, we propose a First-order MW-Net (FMW-Net) algorithm based on value-function approach, which relies solely on first-order gradient information. The novel learning algorithm has better scalability due to its lower compute/memory costs (compared to MW-Net, the time cost is reduced to approximately 33%, and the memory cost is reduced to 75%), making it both practical and efficient for large-scale models in deep learning, e.g., large language models. We present empirical results demonstrating its superior practical efficiency. Source code is available at https://github.com/ybzhouni/FMW-Net.
AB - Deep neural networks (DNNs) have achieved impressive performance in various applications, but are susceptible to overfitting biases in training data, such as label noise and class imbalance. Example reweighting methods can be used to solve this issue, while often require manually specifying the weighting function forms. Recently, Meta-Weight-Net (MW-Net) method has been proposed to automatically learn the weighting function parameterized by a Multi-Layer Perceptron (MLP) in a meta-learning manner. However, the update of MW-Net suffers from expensive computations due to the second-order gradient computation in bilevel optimization. To address this issue, we propose a First-order MW-Net (FMW-Net) algorithm based on value-function approach, which relies solely on first-order gradient information. The novel learning algorithm has better scalability due to its lower compute/memory costs (compared to MW-Net, the time cost is reduced to approximately 33%, and the memory cost is reduced to 75%), making it both practical and efficient for large-scale models in deep learning, e.g., large language models. We present empirical results demonstrating its superior practical efficiency. Source code is available at https://github.com/ybzhouni/FMW-Net.
KW - Bilevel optimization
KW - Example reweighting
KW - Meta learning
KW - Scalable meta-learning
UR - https://www.scopus.com/pages/publications/105006926038
U2 - 10.1007/s13042-025-02681-2
DO - 10.1007/s13042-025-02681-2
M3 - 文章
AN - SCOPUS:105006926038
SN - 1868-8071
VL - 16
SP - 7689
EP - 7706
JO - International Journal of Machine Learning and Cybernetics
JF - International Journal of Machine Learning and Cybernetics
IS - 10
ER -