TY - JOUR
T1 - The impact of class imbalance techniques on crashing fault residence prediction models
AU - Zhao, Kunsong
AU - Xu, Zhou
AU - Yan, Meng
AU - Zhang, Tao
AU - Xue, Lei
AU - Fan, Ming
AU - Keung, Jacky
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/3
Y1 - 2023/3
N2 - Software crashes occur when the software program is executed wrongly or interrupted compulsively, which negatively impacts on user experience. Since the stack traces offer the exception-related information about software crashes, researchers used features collected from the stack trace to automatically identify whether the fault residence where the crash occurred is in the stack trace, aiming at accelerating the process of crash localization. A recent work conducted the first large-scale empirical study, which investigated the impact of feature selection methods on the performance of classification models for this task. However, the crash data have the intrinsic class imbalance characteristic, i.e., there exists a large difference between the number of crash instances inside and outside the stack trace, which is ignored by the previous work. To fill this gap, in this work, we conduct a large-scale empirical study to explore how different imbalanced learning techniques impact the performance of crashing fault residence prediction models on a benchmark dataset comprising seven software projects with four evaluation indicators. Our experimental results demonstrate that two imbalanced variants of the bagging classifier perform better than other compared techniques in both the normal and cross-project settings, and can constantly generate excellent prediction performance even though the imbalance level changes.
AB - Software crashes occur when the software program is executed wrongly or interrupted compulsively, which negatively impacts on user experience. Since the stack traces offer the exception-related information about software crashes, researchers used features collected from the stack trace to automatically identify whether the fault residence where the crash occurred is in the stack trace, aiming at accelerating the process of crash localization. A recent work conducted the first large-scale empirical study, which investigated the impact of feature selection methods on the performance of classification models for this task. However, the crash data have the intrinsic class imbalance characteristic, i.e., there exists a large difference between the number of crash instances inside and outside the stack trace, which is ignored by the previous work. To fill this gap, in this work, we conduct a large-scale empirical study to explore how different imbalanced learning techniques impact the performance of crashing fault residence prediction models on a benchmark dataset comprising seven software projects with four evaluation indicators. Our experimental results demonstrate that two imbalanced variants of the bagging classifier perform better than other compared techniques in both the normal and cross-project settings, and can constantly generate excellent prediction performance even though the imbalance level changes.
KW - Crash localization
KW - Empirical study
KW - Imbalanced learning
KW - Stack trace
UR - https://www.scopus.com/pages/publications/85148695597
U2 - 10.1007/s10664-023-10294-y
DO - 10.1007/s10664-023-10294-y
M3 - 文章
AN - SCOPUS:85148695597
SN - 1382-3256
VL - 28
JO - Empirical Software Engineering
JF - Empirical Software Engineering
IS - 2
M1 - 49
ER -