keywords: Feature selection, principal component analysis, information gain, decision tree
This paper investigates the influence of feature selection approach in the prediction of skin diseases using data mining techniques. Principal Component Analysis (PCA), Information Gain (GA) and Chi-square were the feature selection algorithms used to reduce features of skin diseases dataset. The classification was done using Random Forest, C4.5 Decision Trees and Functional Tree (FT). Experimental results of the developed predictive model on skin diseases have revealed that the feature selection algorithms did not necessarily improve the accuracy and sensitivity of these algorithms and in situation where they brought an improvement; it was just a little about 1 percent.
Adeyemo OO & Adeyeye TO 2015. Comparative study of ID3/C4.5 decision tree and multilayer perception algorithms for the prediction of typhoid fever. Afri. J. Computing & ICT, 8(1): 103–112. Ali, Jehad, Rehanullah Khan, Nasir Ahmad & Imran Maqsood 2012. Random forests and decision trees. Int. J. Comp. Sci. Issues, 9(5): 272–278. Aneeshkumar AS & Venkateswaran CJ 2012. An approach of data mining for predicting the chances of liver disease in ectopic pregnant groups. The Int. Conf. on Commun., Computing and Information Techn., pp. 19–22. Aneeshkumar AS & C Jothi Venkateswaran 2015. Relevance study of data mining for the identification of negatively influenced factors in sick groups.” Procedia-Procedia Comp. Sci., 47: 101– 108. http://dx.doi.org/10.1016/j.procs.2015.03.188. Biau G. 2012. Analysis of a random forests model. J. Machine Learning Res. 13: 1063–1095.