Uncovering Risk Factors for Heart Disease and Predicting Outcomes Using Machine Learning Approaches
Shohel Mahmud *
Department of Statistics, Noakhali Science and Technology University, Noakhali, Bangladesh.
Salma Akter Tania
Department of Statistics, Noakhali Science and Technology University, Noakhali, Bangladesh.
Tanzila Tamanna
Department of Statistics and Data Science, Jahangirnagar University, Savar, Dhaka, Bangladesh.
Md Habibur Rahman
Department of Statistics and Data Science, Jahangirnagar University, Savar, Dhaka, Bangladesh.
Saiful Islam
Department of Statistics and Data Science, Jahangirnagar University, Savar, Dhaka, Bangladesh.
*Author to whom correspondence should be addressed.
Abstract
Aims: This study aims to create a robust machine learning model capable of accurately discerning the presence of heart-related disorders. The aim of this study is to find the best machine learning classification model that is most suitable for predicting risk factors related to heart disease.
Study Design: Analytical cross-sectional study.
Place and Duration of Study: Department of Statistics at the Noakhali Science and Technology University, and three tertiary level hospitals of Bangladesh (Noakhali General Hospital, Chittagong Medical College Hospital, and the National Institute of Cardiovascular Diseases, Dhaka), from June 2022 to August 2023.
Methodology: The conceptual framework underlying this study proposes a descriptive methodology in which study data are collected from hospital admitted patients who have heart disease symptoms and equal size of patients who have no heart related disease. Primary data were obtained using self-designed questionnaire which were administered by the researchers. The sample size for the study is 340 comprising of 247 males and 93 females, who were selected by convenient sample method.
Results: Evaluating simulation models reveals the Decision Tree as the most compelling choice due to its high accuracy, interpretability, and statistical significance. The outcomes of real data analysis that the Decision Tree model emerges as the preeminent candidate, showcasing extraordinary predictive proficiencies in discerning the risk quotient associated with heart disease, achieving an accuracy of 91%, a sensitivity of 88%, and a specificity of 91%.
Conclusion: The results highlight the most effective machine learning algorithms for classification in the context of heart-related disease risk factors predictions. However, future research endeavors could enhance this study by incorporating additional clinical, demographic, and social determinants.
Keywords: Machine learning, model comparison, accuracy, heart disease prediction, Bangladesh