The Impact of Feature Selection on Defect Prediction Performance: An Empirical Comparison

  • Zhou Xu
  • , Jin Liu
  • , Zijiang Yang
  • , Gege An
  • , Xiangyang Jia

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

141 Scopus citations

Abstract

Software defect prediction aims to determine whether a software module is defect-prone by constructing prediction models. The performance of such models is susceptible to the high dimensionality of the datasets that may include irrelevant and redundant features. Feature selection is applied to alleviate this issue. Because many feature selection methods have been proposed, there is an imperative need to analyze and compare these methods. Prior empirical studies may have potential controversies and limitations, such as the contradictory results, usage of private datasets and inappropriate statistical test techniques. This observation leads us to conduct a careful empirical study to reinforce the confidence of the experimental conclusions by considering several potential source of bias, such as the noise in the dataset and the dataset types. In this paper, we investigate the impact of 32 feature selection methods on the defect prediction performance over two versions of the NASA dataset (i.e., the noisy and clean NASA datasets) and one open source AEEEM dataset. We use a state-of-the-art double Scott-Knott test technique to analyze these methods. Experimental results show that the effectiveness of these feature selection methods on defect prediction performance varies significantly over all the datasets.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE 27th International Symposium on Software Reliability Engineering, ISSRE 2016
PublisherIEEE Computer Society
Pages309-320
Number of pages12
ISBN (Electronic)9781467390019
DOIs
StatePublished - 5 Dec 2016
Event27th IEEE International Symposium on Software Reliability Engineering, ISSRE 2016 - Ottawa, United States
Duration: 23 Oct 201627 Oct 2016

Publication series

NameProceedings - International Symposium on Software Reliability Engineering, ISSRE
ISSN (Print)1071-9458

Conference

Conference27th IEEE International Symposium on Software Reliability Engineering, ISSRE 2016
Country/TerritoryUnited States
CityOttawa
Period23/10/1627/10/16

Keywords

  • defect prediction
  • feature selection
  • Scott-Knott test

Fingerprint

Dive into the research topics of 'The Impact of Feature Selection on Defect Prediction Performance: An Empirical Comparison'. Together they form a unique fingerprint.

Cite this