keywords: Classification, dataset, ensemble, feature selection, software defect prediction
Software defect prediction is the process of locating defective modules in software. It facilitates testing efficiency and consequently software quality. It enables a timely identification of fault-prone modules. The use of single classifiers and ensembles for predicting defects in software has been met with inconsistent results. Previous analysis say ensemble are often more accurate and are less affected by noise in datasets, also achieving lower average error rates than any of the constituent classifiers. However, inconsistencies exist in these various experiments and the performance of learning algorithms may vary using different performance measures and under different circumstances. Therefore, more research is needed to evaluate the performance of ensemble algorithms in software defect prediction. Adding feature selection reduces data sets with fewer features and improves the classifiers and ensemble performance over the datasets. The goal of this paper is to assess the efficiency of ensemble methods in software defect prediction using feature selection. This study compares the performance of four ensemble algorithms using 11 different performance metrics over 11 software defect datasets from the NASA MDP repository. The results indicate that feature selection and use of ensemble methods can improve the classification results of software defect prediction. Bagged ensemble models have the best results. In addition, Voting and Stacking also performed better than individual base classifiers. In terms of single classifier, SMO performs best as it outperformed Decision Tree (J48), MLP, and KNN with and without feature selection. Thus, it can be derived that feature selection can help improve the accuracy of both individual classifiers and ensemble methods by removing noisy and inconsistent features in the datasets.
Ahmad AK & Nashat M 2012. Metaheuristic optimization algorithms for training artificial neural networks. Int. J. Comp. & Information Techn., 1(2): 1 – 6. Akintola AG, Balogun AO, Lafenwa-Balogun FB & Mojeed HA 2018. Comparative analysis of selected heterogeneous classifiers for software defects prediction using filter-based feature selection methods. FUOYE J. Engr. & Techn., 3(1): 134 – 137. Ameen AO, Balogun AO, Usman G & Fashoto SG 2016. Heterogenous ensemble methods based on filter feature selection. Computing, Information Sys., Devt. Informatics & Allied Res. J., 7(4): 63 – 78. Aruna S, Dilsha D, Radhika R & Swathi JN 2016. Cost sensitive classification and feature selection for software defect prediction. Int. J. Advanced Res. Comp. Sci. & Software Engr., 6(4): 1 – 2. Asha GK, Jayaram MA & Manjunath AS 2010. Feature subset selection problem using wrapper approach in supervised learning. Int. J. Comp. Applic., 1(7): 1 – 2. Bauer E & Kohavi R 1999. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1/2): 105–139. Breiman L 1994. Bagging Predictors, Technical Report, Department of Statistics, University of California, Berkeley, USA. Dietterich TG 2009. Ensemble Methods in Machine Learning. First International Workshop on Multiple Classifier Systems, pp. 1–15. Freund Y & Schapire R 1996. Experiments with a new Boosting Algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148-156. Gaganjot K & Amit C 2014. Improved J48 classification algorithm for the prediction of diabetes. Int. J. 109csComputer Applic., 98(22): 1-5. Gayathri M & Sudha A 2014. Software Defect Prediction System using Multilayer Perceptron Neural Network with Data Mining. Int. J. Recent Techn. and Engr., 3: 54-59. Huang FJ, Zhou H, Zhang J & Chen T 2000. Pose invariant face recognition. In Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 245–250. Hui 2014. Software Defect Classification Prediction Based on Mining Software Repository, Department of Information Technology, Uppsala University, Sweden. IEEE 1990. IEEE Standard 610.12-1990, IEEE Standard Glossary of Software Engineering Terminology. Keerthi SS & Gilbert EG 2002. Convergence of a Generalized SMO Algorithm for SVM ClassifierDesign. Labani M, Moradi P, Ahmadizar F & Jalili M 2018. A novel multivariate filter method for feature selection in text classification problems. Engr. Applic. Artificial Intelligence, 70: 25-37. Lan Sommerville 2009. Software Engineering, Boston, United States of America. Laradji IH, Alshayeb M & Ghouti L 2015. Software defect prediction using ensemble learning on selected features. Information and Software Technology, 58: 388-402. Lessmann S, Baesens B, Mues C & Pietsch S 2008. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4) 485–496. Lior 2010. Ensemble based classifiers.33, 1-39. doi 10.1007/s10462-009-9124-7. Mardani A, Jusoh A, Zavadskas EK, Cavallaro F & Khalifah Z 2015. Sustainable and renewable energy: An overview of the application of multiple criteria decision making techniques and approaches. Sustainability, 7: 13947–13984. Naheed A & Shazia U 2011. Analysis of data mining based software defect prediction techniques. Global J. Comp. Sci. and Techn., 11(16): 1-2.