keywords: Clustering, data mining, hierarchical, K-means, KNIME, tanagra, WEKA
Data mining is used to discover knowledge from information system. Clustering is one of the techniques used for data mining. It can be defined as a technique of grouping un-labelled data objects such that objects belonging to one cluster are not similar to the objects belonging to another cluster. Data mining tools refer to the software that are used for the process of efficiently analysing, summarizing and extracting useful information from different perspectives of data. This paper presents a comparative analysis of four open-source data mining software tools (WEKA, KNIME, Tanagra and Orange) in the context of data clustering, specifically K-Means and Hierarchical clustering methods. The results of the performance analysis based on the execution time and quality of clusters showed that WEKA tool outperforms the other tools with the lowest SSE of 199.7308 with an average execution time of 1.535 seconds. Knime has SSE of 222.217 but with an average execution time of 7.13 seconds, and then Tanagra with SSE of 269.3902 and average execution time of 2.01 seconds, Orange has the poorest performance with SSE of 388.78.