Integrating Recursive Feature Elimination Technique to a Balanced Clustered Mental Health Data

Leah.W.Chege *

Department of Mathematics and Actuarial Science, Catholic University of Eastern Africa, Nairobi, 62157-00200, Kenya.

Hellen.W.Waititu

Department of Mathematics and Actuarial Science, Catholic University of Eastern Africa, Nairobi, 62157-00200, Kenya.

*Author to whom correspondence should be addressed.


Abstract

The state of mental health has shown to be of significance to an individual’s quality of life. There are a couple of factors that may lead to psychological disorders. These may include: biological, social, environmental and many more.

Aim: This study aims to filter out the factors by selecting the variables based on their impact on an individual’s emotional well being. Understanding of the least contributing factors will play a significant role in research studies, health sector, governments bodies and so on. This is through helping them minimize their area of focus while dealing with mental health awareness and making it easier for them to give better mental health care solutions.

Sample: In this study a sample of 10,000 observations from a generated data, comprising of 12 variables was used for the analysis. These variables included: generated identification number, gender, age, marital status, family members, residence, occupation, medical test, diagnosis, cause, treatment and payment.

Methodology: Random Undersampling balancing technique was first applied to the mental health data. This was in order to deal with the imbalanced nature of the data and thus reduce model selection biasness. Subsequently K Means clustering technique was then used so as to group the observations into distinct sub-groups. Clustering of the data helped in improving the accuracy of the output in the study. Finally, Recursive Feature Elimination Technique (RFE) was then integrated to the balanced clustered data. Application of RFE helped in selecting the variable that least affects an individual’s mental well being.

Results: Based on the RFE plot the cross validation curve seemed to be rising, this indicated that the models performance that is the accuracy and F1 scores were good and relatively stable with the range of variables. Marital status variable indicated lower values for both the root mean squared error standard deviation (RMSESD) and Mean Absolute Error Standard Deviation (MAESD), with values 0.003605 and 0.0003382 respectively.

Conclusion: The findings given above, have shown that marital status variable was selected as the least contributing factor that leads to psychological conditions.

Keywords: Random undersampling technique, K means clustering technique, recursive feature elimination technique


How to Cite

Leah.W.Chege, and Hellen.W.Waititu. 2026. “Integrating Recursive Feature Elimination Technique to a Balanced Clustered Mental Health Data”. Asian Journal of Probability and Statistics 28 (1):1-12. https://doi.org/10.9734/ajpas/2026/v28i1850.

Downloads

Download data is not yet available.