TY - JOUR
T1 - An empirical bias-variance analysis of DECORATE ensemble method at different training sample sizes
AU - Zhang, Chun Xia
AU - Wang, Guan Wei
AU - Zhang, Jiang She
PY - 2012/4
Y1 - 2012/4
N2 - DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples) is a classifier combination technique to construct a set of diverse base classifiers using additional artificially generated training instances. The predictions from the base classifiers are then integrated into one by the mean combination rule. In order to gain more insight about its effectiveness and advantages, this paper utilizes a large experiment to study the bias-variance analysis of DECORATE as well as some other widely used ensemble methods (such as bagging, AdaBoost, random forest) at different training sample sizes. The experimental results yield the following conclusions. For small training sets, DECORATE has a dominant advantage over its rivals and its success is attributed to the larger bias reduction achieved by it than the other algorithms. With increase in training data, AdaBoost benefits most and the bias reduced by it gradually turns to be significant while its variance reduction is also medium. Thus, AdaBoost performs best with large training samples. Moreover, random forest behaves always second best regardless of small or large training sets and it is seen to mainly decrease variance while maintaining low bias. Bagging seems to be an intermediate one since it reduces variance primarily.
AB - DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples) is a classifier combination technique to construct a set of diverse base classifiers using additional artificially generated training instances. The predictions from the base classifiers are then integrated into one by the mean combination rule. In order to gain more insight about its effectiveness and advantages, this paper utilizes a large experiment to study the bias-variance analysis of DECORATE as well as some other widely used ensemble methods (such as bagging, AdaBoost, random forest) at different training sample sizes. The experimental results yield the following conclusions. For small training sets, DECORATE has a dominant advantage over its rivals and its success is attributed to the larger bias reduction achieved by it than the other algorithms. With increase in training data, AdaBoost benefits most and the bias reduced by it gradually turns to be significant while its variance reduction is also medium. Thus, AdaBoost performs best with large training samples. Moreover, random forest behaves always second best regardless of small or large training sets and it is seen to mainly decrease variance while maintaining low bias. Bagging seems to be an intermediate one since it reduces variance primarily.
KW - AdaBoost
KW - bias-variance decomposition
KW - classifier combination method
KW - random forest
KW - training sample size
UR - https://www.scopus.com/pages/publications/84859177457
U2 - 10.1080/02664763.2011.620949
DO - 10.1080/02664763.2011.620949
M3 - 文章
AN - SCOPUS:84859177457
SN - 0266-4763
VL - 39
SP - 829
EP - 850
JO - Journal of Applied Statistics
JF - Journal of Applied Statistics
IS - 4
ER -