Comparative Analysis of K-Means and Divisive Clustering Techniques on Balanced Mental Health Data

Leah.W.Chege *

Department of Mathematics and Actuarial Science, Catholic University of Eastern Africa, Nairobi, 62157-00200, Kenya.

Hellen.W.Waititu

Department of Mathematics and Actuarial Science, Catholic University of Eastern Africa, Nairobi, 62157-00200, Kenya.

Cornelious.O.Nyakundi

Department of Mathematics and Actuarial Science, Catholic University of Eastern Africa, Nairobi, 62157-00200, Kenya.

*Author to whom correspondence should be addressed.


Abstract

Effective Clustering of mental health data can provide significant insights into patterns and relationships that are critical for understanding mental health conditions. This study investigated various clustering techniques applied to balanced mental health data to avoid biases associated with an imbalanced data. Clustering of the balanced mental health data was done with respect to the area of Residence feature. Firstly, Random undersampling and SMOTE techniques were incorporated to the imbalanced data set as balancing techniques so as to improve model performance. Random Undersampling Technique turned out to be the most ideal balancing technique with its accuracy, recall, precision and F-score values as 1. After balancing the data, two clustering techniques were applied to the Random Undersampled balanced data. The two techniques were namely: K-means and Divisive techniques. In order to select which of the two clustering techniques is ideal, two test statistics namely Internal Validation and Stability Validation were applied. Results showed that K-means clustering technique indicated slightly lower Average Propotion of None-overlap, Average Distance between Means and Figure Of Merit values given as 0.12, 0.41 and 0.9972 as compared to Divisive clustering technique which were 0.14, 0.42 and 0.9999. The conclusion was that K-means clustering has a better performance. This study's findings will help guide future researchers dealing with mental health data analysis on ways to improve model performance for better and more reliable predictions.

Keywords: Mental health, residence, random undersampling, k-means clustering technique, divisive clustering technique, stability validation, internal validation


How to Cite

Leah.W.Chege, Hellen.W.Waititu, and Cornelious.O.Nyakundi. 2024. “Comparative Analysis of K-Means and Divisive Clustering Techniques on Balanced Mental Health Data”. Asian Journal of Probability and Statistics 26 (10):64-79. https://doi.org/10.9734/ajpas/2024/v26i10659.