Ensemble Machine Learning Application and Feature Importance Detection in Stock Price Prediction
Sunday M. Okoh
Department of Statistics, University of Nigeria, Nsukka, Nigeria.
Everestus O. Ossai *
Department of Statistics, University of Nigeria, Nsukka, Nigeria.
Tobias E. Ugah
Department of Statistics, University of Nigeria, Nsukka, Nigeria.
*Author to whom correspondence should be addressed.
Abstract
In this study the performance of some ensemble machine learning techniques for stock price prediction in volatile financial markets were investigated. Random Forests, Gradient Boosting and Stacking techniques alongside feature importance evaluation methods like Least Absolute Shrinkage and Selection Operator (LASSO), Shapley Additive Explanations (SHAP) and Gini Feature Importance were employed to forecast stock prices for major companies - Google and the S&P 500 index - using historical data varying differently from 2004 to 2023. Specifically, Stacking model achieved a lower Mean Absolute Error (MAE) and an R-squared (R²) closer to 1, slightly surpassing both Random Forests and Gradient Boosting across all the datasets. Feature importance analysis and Shapley Additive Explanations identified the features ‘High’, ‘Open’ and ‘Low’ as key contributors to stock price predictions. They further enhanced the models by improving robustness and reducing over fitting. This study highlights that ensemble methods not only improve predictive accuracy but also offer valuable interpretability which is crucial for financial analysts and decision-makers. Overall, this study demonstrates the potential of combining ensemble techniques with feature importance analysis for stock price prediction, offering a framework that can be adapted for other financial forecasting applications. A practical implication of the findings of this study is that, from the datasets investigated, the market participants should attach more importance to the features ‘High’ and ‘Low’ than all other features that derive stock price movement.
Keywords: Stock price, machine learning, prediction, feature importance, robustness