Abstract
Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem. However, these conventional methods, such as sampling, cost-sensitive learning, Bagging, and Boosting, could suffer from the loss of important information, unexpected mistakes, and overfitting because they alter the original data distribution. This paper presents a novel method that first converts the imbalanced binary-class data into balanced multiclass data and then builds a defect predictor on the multiclass data with a specific coding scheme. A thorough experiment with four different types of classification algorithms, three data coding schemes, and six conventional imbalance data-handling methods was conducted over the 14 NASA datasets. The experimental results show that the proposed method with a one-against-one coding scheme is averagely superior to the conventional methods.
| Original language | English |
|---|---|
| Article number | 6392473 |
| Pages (from-to) | 1806-1817 |
| Number of pages | 12 |
| Journal | IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews |
| Volume | 42 |
| Issue number | 6 |
| DOIs | |
| State | Published - 2012 |
Keywords
- Class-imbalance data
- meta learning
- multiclassifier
- software defect prediction
Fingerprint
Dive into the research topics of 'Using coding-based ensemble learning to improve software defect prediction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver