Using coding-based ensemble learning to improve software defect prediction

Research output: Contribution to journalArticlepeer-review

180 Scopus citations

Abstract

Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem. However, these conventional methods, such as sampling, cost-sensitive learning, Bagging, and Boosting, could suffer from the loss of important information, unexpected mistakes, and overfitting because they alter the original data distribution. This paper presents a novel method that first converts the imbalanced binary-class data into balanced multiclass data and then builds a defect predictor on the multiclass data with a specific coding scheme. A thorough experiment with four different types of classification algorithms, three data coding schemes, and six conventional imbalance data-handling methods was conducted over the 14 NASA datasets. The experimental results show that the proposed method with a one-against-one coding scheme is averagely superior to the conventional methods.

Original languageEnglish
Article number6392473
Pages (from-to)1806-1817
Number of pages12
JournalIEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews
Volume42
Issue number6
DOIs
StatePublished - 2012

Keywords

  • Class-imbalance data
  • meta learning
  • multiclassifier
  • software defect prediction

Fingerprint

Dive into the research topics of 'Using coding-based ensemble learning to improve software defect prediction'. Together they form a unique fingerprint.

Cite this