The Use of Clustering and Classification Methods in Machine Learning and Comparison of Some Algorithms of the Methods
Abstract
In this article, two machine learning methods such as classification and clustering are used for decision tree (DT), artificial neural network (ANN), and K-nearest neighbors algorithms. The datasets were used to evaluate the effectiveness of the clustering method and the data mining tool. Weather data were used to compare algorithms and methods in the study. This study showed that the best model was DT according to accuracy and precision measures but the best model according to F-measure and receiver operating characteristic curve area measures was ANN. Waikato Environment for Knowledge Analysis, a data mining tool, is utilized in this paper to carry out the clustering.
Downloads
References
M. Baran. Maki̇ne Öğrenmesi̇ Yöntemleri̇yle Çoklu Eti̇ketli̇ Veri̇leri̇n Sınıflandırılması. (Sivas Cumhuriyet Üniversitesi, Sosya Bilimler Enstitüsü, Turkey, 2020.
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann and I. Wietten. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, vol. 11, pp. 10–18, 2009.
A. A. Soofi and A. Awan. Classification techniques in machine learning: Applications and issues. Journal of Basic and Applied Sciences, vol. 13, pp. 459–465, 2017.
V. Sharma. Survey of classification algorithms and various model selection methods. Journal of Machine Learning Research, vol. 1, pp. 1–48, 2000.
D. Sisodia and D. S. Sisodia. Prediction of diabetes using classification algorithms. Procedia Computer Science, vol. 132, pp. 1578–1585, 2018.
M. Ambigavathi and D. Sridharan. Analysis of clustering algorithms in machine learning for healthcare data. In: Advances in Computing and Data Sciences. Vol. 1244. Springer, Berlin, pp. 117–128.
A. C. Lorena, L. P. F. Garcia, J. Lehmann, M. C. P. Souto and T. K. Ho. How complex is your classification problem? A survey on measuring classification complexity. ACM Computing Surveys, vol. 52, pp. 1–34, 2018.
D. Liu, R. Sun and H. Ren. Efficient fraud detection classification: Class imbalanceand attribute correlations. The Frontiers of Society, Science and Technology, vol. 2, pp. 96–103, 2020.
X. Zheng. SMOTE Variants for Imbalanced Binary Classification: Heart Disease Prediction. University of California, California, 2020.
T. F. Malone. Application of statistical methods in weather prediction. Proceedings of the National Academy of Sciences, vol. 41, pp. 806–815, 1955.
L. Liu and J. L. Priestley. A comparison of machine learning algorithms for prediction of past due service in commercial credit. In: Grey Literature from PhD Candidates. DigitalCommons Kennesaw State University, Georgia, 2018.
G. Nakhaeizadeh and C. C. Taylor. Machine Learning and Statistics : The Interface. Wiley, United States, 1997.
G. A. A. Mulla, Y. Demir and M. M. Hassan. Combination of PCA with SMOTE oversampling for classification of high-dimensional imbalanced data. BEU Journal of Science, vol. 10, pp. 858–869, 2021.
M. A. Habara. Credit Risk Modelling in a Developing Economy: The Case of Libya. Griffith University, Australia, 2009.
L. Saitta and F. Neri. Learning in the “real world”. Machine Learning, vol. 30, pp. 133–163, 1998.
L. C. Thomas. A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. International Journal of Forecasting, vol. 16, pp. 149–172, 2000.
W. E. Henley and D. J. Hand. A K-nearest-neighbour classifier for assessing consumer credit risk. Journal of the Royal Statistical Society. Series D, vol. 45, pp. 77–95, 1996.
J. A. K. Suykens and J. Vandewalle. Least squares support vector machine classifiers. Neural Processing Letters, vol. 9, pp. 293–300, 1999.
P. Nerurkar, A. Shirke, M. Chandane and S. Bhirud. Empirical analysis of data clustering algorithms. Procedia Computer Science, 125, pp. 770–779, 2018.
S. B. Tambe and S. S. Gajre. Cluster-based real-time analysis of mobile healthcare application for prediction of physiological data. Journal of Ambient Intelligence and Humanized Computing, vol. 9, pp. 429–445, 2017.
P. D. Kumar, T. Amgoth and C. S. R. Annavarapu. Machine learning algorithms for wireless sensor networks: A survey. Information Fusion, vol. 49, pp. 1–25, 2019.
Data Mining, Machine Learning and Predictive Analytics Software Minitab. Minitab, 2020. Available from: https://www.minitab.com/en-us/products/spm [Last accessed on 2023 Jun 07].
Copyright (c) 2023 Guhdar A. A. Mulla, Yıldırım Demir
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-ND 4.0] that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).