The impact of class imbalance techniques on crashing fault residence prediction models

  • Kunsong Zhao
  • , Zhou Xu
  • , Meng Yan
  • , Tao Zhang
  • , Lei Xue
  • , Ming Fan
  • , Jacky Keung

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Software crashes occur when the software program is executed wrongly or interrupted compulsively, which negatively impacts on user experience. Since the stack traces offer the exception-related information about software crashes, researchers used features collected from the stack trace to automatically identify whether the fault residence where the crash occurred is in the stack trace, aiming at accelerating the process of crash localization. A recent work conducted the first large-scale empirical study, which investigated the impact of feature selection methods on the performance of classification models for this task. However, the crash data have the intrinsic class imbalance characteristic, i.e., there exists a large difference between the number of crash instances inside and outside the stack trace, which is ignored by the previous work. To fill this gap, in this work, we conduct a large-scale empirical study to explore how different imbalanced learning techniques impact the performance of crashing fault residence prediction models on a benchmark dataset comprising seven software projects with four evaluation indicators. Our experimental results demonstrate that two imbalanced variants of the bagging classifier perform better than other compared techniques in both the normal and cross-project settings, and can constantly generate excellent prediction performance even though the imbalance level changes.

Original languageEnglish
Article number49
JournalEmpirical Software Engineering
Volume28
Issue number2
DOIs
StatePublished - Mar 2023

Keywords

  • Crash localization
  • Empirical study
  • Imbalanced learning
  • Stack trace

Fingerprint

Dive into the research topics of 'The impact of class imbalance techniques on crashing fault residence prediction models'. Together they form a unique fingerprint.

Cite this