The Use of Clustering and Classification Methods in Machine Learning and Comparison of Some Algorithms of the Methods

Keywords: algorithms, classification, clustering, machine learning, decisions tree

Abstract

In this article, two machine learning methods such as classification and clustering are used for decision tree (DT), artificial neural network (ANN), and K-nearest neighbors algorithms. The datasets were used to evaluate the effectiveness of the clustering method and the data mining tool. Weather data were used to compare algorithms and methods in the study. This study showed that the best model was DT according to accuracy and precision measures but the best model according to F-measure and receiver operating characteristic curve area measures was ANN. Waikato Environment for Knowledge Analysis, a data mining tool, is utilized in this paper to carry out the clustering.

Downloads

Download data is not yet available.

Author Biographies

Guhdar A. A. Mulla, Department of Economic, Faculty of Economics and Administrations, Nawroz University, Kurdistan Region, Iraq

Guhdar Abdulaziz Ahmed Mulla Assistant Lecture at the Department of economic, Faculty of Economics and Administrations, Nawroz University, He got the B.Sc degree in statistics department, the M.Sc degree in Data mining. His research interests are in Econometrics, Applied Statistics, Circular Data Analysis, Statistical Quality Control, Quantitative Decision Making.  

Yıldırım Demir, Department of Statistics, Faculty of Economics and Administrative Sciences, Van Yuzuncu Yil University, Van, Turkey

Yıldırım Demir is Assistant Prof. at the Department of Statistics, Faculty of Economics and Administrative Sciences, Van Yüzüncü Yıl University, He got the B.Sc. degree in Electric, the M.Sc. degree in Biometrics ant the Ph.D. degree in Biometrics. His research interests are in Econometrics, Applied Statistics, Circular Data Analysis, Statistical Quality Control, Quantitative Decision Making. Dr. Demir is a member of Turkey Society.

References

M. Baran. Maki̇ne Öğrenmesi̇ Yöntemleri̇yle Çoklu Eti̇ketli̇ Veri̇leri̇n Sınıflandırılması. (Sivas Cumhuriyet Üniversitesi, Sosya Bilimler Enstitüsü, Turkey, 2020.

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann and I. Wietten. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, vol. 11, pp. 10–18, 2009.

A. A. Soofi and A. Awan. Classification techniques in machine learning: Applications and issues. Journal of Basic and Applied Sciences, vol. 13, pp. 459–465, 2017.

V. Sharma. Survey of classification algorithms and various model selection methods. Journal of Machine Learning Research, vol. 1, pp. 1–48, 2000.

D. Sisodia and D. S. Sisodia. Prediction of diabetes using classification algorithms. Procedia Computer Science, vol. 132, pp. 1578–1585, 2018.

M. Ambigavathi and D. Sridharan. Analysis of clustering algorithms in machine learning for healthcare data. In: Advances in Computing and Data Sciences. Vol. 1244. Springer, Berlin, pp. 117–128.

A. C. Lorena, L. P. F. Garcia, J. Lehmann, M. C. P. Souto and T. K. Ho. How complex is your classification problem? A survey on measuring classification complexity. ACM Computing Surveys, vol. 52, pp. 1–34, 2018.

D. Liu, R. Sun and H. Ren. Efficient fraud detection classification: Class imbalanceand attribute correlations. The Frontiers of Society, Science and Technology, vol. 2, pp. 96–103, 2020.

X. Zheng. SMOTE Variants for Imbalanced Binary Classification: Heart Disease Prediction. University of California, California, 2020.

T. F. Malone. Application of statistical methods in weather prediction. Proceedings of the National Academy of Sciences, vol. 41, pp. 806–815, 1955.

L. Liu and J. L. Priestley. A comparison of machine learning algorithms for prediction of past due service in commercial credit. In: Grey Literature from PhD Candidates. DigitalCommons Kennesaw State University, Georgia, 2018.

G. Nakhaeizadeh and C. C. Taylor. Machine Learning and Statistics : The Interface. Wiley, United States, 1997.

G. A. A. Mulla, Y. Demir and M. M. Hassan. Combination of PCA with SMOTE oversampling for classification of high-dimensional imbalanced data. BEU Journal of Science, vol. 10, pp. 858–869, 2021.

M. A. Habara. Credit Risk Modelling in a Developing Economy: The Case of Libya. Griffith University, Australia, 2009.

L. Saitta and F. Neri. Learning in the “real world”. Machine Learning, vol. 30, pp. 133–163, 1998.

L. C. Thomas. A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. International Journal of Forecasting, vol. 16, pp. 149–172, 2000.

W. E. Henley and D. J. Hand. A K-nearest-neighbour classifier for assessing consumer credit risk. Journal of the Royal Statistical Society. Series D, vol. 45, pp. 77–95, 1996.

J. A. K. Suykens and J. Vandewalle. Least squares support vector machine classifiers. Neural Processing Letters, vol. 9, pp. 293–300, 1999.

P. Nerurkar, A. Shirke, M. Chandane and S. Bhirud. Empirical analysis of data clustering algorithms. Procedia Computer Science, 125, pp. 770–779, 2018.

S. B. Tambe and S. S. Gajre. Cluster-based real-time analysis of mobile healthcare application for prediction of physiological data. Journal of Ambient Intelligence and Humanized Computing, vol. 9, pp. 429–445, 2017.

P. D. Kumar, T. Amgoth and C. S. R. Annavarapu. Machine learning algorithms for wireless sensor networks: A survey. Information Fusion, vol. 49, pp. 1–25, 2019.

Data Mining, Machine Learning and Predictive Analytics Software Minitab. Minitab, 2020. Available from: https://www.minitab.com/en-us/products/spm [Last accessed on 2023 Jun 07].

Published
2023-06-10
How to Cite
1.
Mulla G, Demir Y. The Use of Clustering and Classification Methods in Machine Learning and Comparison of Some Algorithms of the Methods. cuesj [Internet]. 10Jun.2023 [cited 27Apr.2024];7(1):52-9. Available from: https://journals.cihanuniversity.edu.iq/index.php/cuesj/article/view/900
Section
Research Article