TY - GEN
T1 - An Empirical Study on Model Pruning and Quantization
AU - Tian, Yuzhe
AU - Luan, Tom H.
AU - Zheng, Xi
N1 - Publisher Copyright:
© 2023, ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering.
PY - 2023
Y1 - 2023
N2 - In machine learning, model compression is vital for resource-constrained Internet of Things (IoT) devices, such as unmanned aerial vehicles (UAVs) and smart phones. Currently there are some state-of-the-art (SOTA) compression methods, but little study is conducted to evaluate these techniques across different models and datasets. In this paper, we present an in-depth study on two SOTA model compression methods, pruning and quantization. We apply these methods on AlexNet, ResNet18, VGG16BN and VGG19BN, with three well known datasets, Fashion-MNIST, CIFAR-10, and UCI-HAR. Through our study, we draw the conclusion that, applying pruning and retraining could keep the performance (average less than degrade) while reducing the model size (at compression rate) on spatial domain datasets (e.g. pictures); the performance on temporal domain datasets (e.g. motion sensors data) degrades more (average about degrade); the performance of quantization is related with the pruning rate and the network architecture. We also compare different clustering methods and reveal the impact on model accuracy and quantization ratio. Finally, we provide some interesting directions for future research.
AB - In machine learning, model compression is vital for resource-constrained Internet of Things (IoT) devices, such as unmanned aerial vehicles (UAVs) and smart phones. Currently there are some state-of-the-art (SOTA) compression methods, but little study is conducted to evaluate these techniques across different models and datasets. In this paper, we present an in-depth study on two SOTA model compression methods, pruning and quantization. We apply these methods on AlexNet, ResNet18, VGG16BN and VGG19BN, with three well known datasets, Fashion-MNIST, CIFAR-10, and UCI-HAR. Through our study, we draw the conclusion that, applying pruning and retraining could keep the performance (average less than degrade) while reducing the model size (at compression rate) on spatial domain datasets (e.g. pictures); the performance on temporal domain datasets (e.g. motion sensors data) degrades more (average about degrade); the performance of quantization is related with the pruning rate and the network architecture. We also compare different clustering methods and reveal the impact on model accuracy and quantization ratio. Finally, we provide some interesting directions for future research.
KW - Deep neural network
KW - Edge computing
KW - Model compression
UR - https://www.scopus.com/pages/publications/85172691657
U2 - 10.1007/978-3-031-40467-2_7
DO - 10.1007/978-3-031-40467-2_7
M3 - 会议稿件
AN - SCOPUS:85172691657
SN - 9783031404665
T3 - Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST
SP - 111
EP - 125
BT - Broadband Communications, Networks, and Systems - 13th EAI International Conference, BROADNETS 2022, Proceedings
A2 - Wang, Wei
A2 - Wu, Jun
PB - Springer Science and Business Media Deutschland GmbH
T2 - Proceedings of the 13th EAI International Conference on Broadband Communications, Networks, and Systems, BROADNETS 2022
Y2 - 12 March 2023 through 13 March 2023
ER -