Analysis of Individual Loan Defaults Using Logit under Supervised Machine Learning Approach

Main Article Content

Dominic M. Obare
Gladys G. Njoroge
Moses M. Muraya


Financial institutions have a large amount of data on their borrowers, which can be used to predict the probability of borrowers defaulting their loan or not. Some of the models that have been used to predict individual loan defaults include linear discriminant analysis models and extreme value theory models. These models are parametric in nature since they assume that the response being investigated takes a particular functional form. However, there is a possibility that the functional form used to estimate the response is very different from the actual functional form of the response. The purpose of this research was to analyze individual loan defaults in Kenya using the logistic regression model. The data used in this study was obtained from equity bank of Kenya for the period between 2006 to 2016. A random sample of 1000 loan applicants whose loans had been approved by equity bank of Kenya during this period was obtained. Data obtained was on the credit history, purpose of the loan, loan amount, nature of the saving account, employment status, sex of the applicant, age of the applicant, security used when acquiring the loan and the area of residence of the applicant (rural or urban). This study employed a quantitative research design, it deals with individual loans defaults as group characteristics of a borrower. The data was pre-processed by seeding using R- Software and then split into training dataset and test data set. The train data was used to train the logistic regression model by employing Supervised machine learning approach. The R-statistical software was used for the analysis of the data. The test data set was used to do cross-validation of the developed logistic model which later was used for analysis prediction of individual loan defaults. This study focused on the analysis of individual loan defaults in Kenya using the logistic regression model in Machine learning. The logistic regression model predicted 303 defaults from train data set, 122 non-defaults and misclassified loans were 56 and 69. The model had an accuracy of 0.7727 with the train data and 0.7333 with the test data. The logistic regression model showed a precision of 0.8440 and 0.8244 with the train and test data respectively. The performance of the model with both the train and test data was illustrated using a plot of train errors and test errors against sample size on the same axes. The plot showed that the performance of the model increases with an increase in sample size. The study recommended the use of logistic regression in conjunction with supervised machine learning approach in loan default prediction in financial institutions and also more research should be carried out on ensemble methods of loan defaults prediction in order to increase the prediction accuracy.

Loan defaults, loan default prediction, logistic regression model, Kenya

Article Details

How to Cite
Obare, D., Njoroge, G., & Muraya, M. (2019). Analysis of Individual Loan Defaults Using Logit under Supervised Machine Learning Approach. Asian Journal of Probability and Statistics, 3(4), 1-12.
Original Research Article


Divino JA, Lima ES, Orrillo J. Interest rates and default in unsecured loan markets. Quantitative Finance. 2013;13(12):1925-1934.

Kugiel L, Jakobsen M. Fund transfer pricing in a commercial bank. Master's Thesis, MSC in Finance and International Business; 2009.

Evusa Z, Mudaki JS, Ojala DO. Evaluation of the factors leading to loan default at equity bank, Kenya. Journal of Economics and Sustainability; 2015.

Martin, Aruldos, Travis Miranda Lakshmi, Venkatasamy Prasanna Venkatesan. A framework to develop qualitative bankruptcy prediction rules. St. Joseph’s Journal of Humanities and Science. 2010;1:73-83.

Agbemava, Edinam. Logistic regression analysis of predictors of loan defaults by customers of non-traditional banks in Ghana. European Scientific Journal. Esj. 2016;12(1).

DOI: 10.19044/esj. 2016.v12n1p175

Lahsana A, Anion R, Wah T. Credit scoring models using soft computing methods: A survey. International Arab Journal of Information Technology. 2010;7(2):115-123.

Bekhet H, Eletter S. Credit risk management for the Jordanian commercial banks: Neural scoring approach. Review of Development Finance. 2014;4:20-28.

Akkoc S. An empirical comparison of conventional techniques, neural networks and three stage hybrid adaptive neuro fuzzy inference systems (ANFIS) model for credit scoring analysis: The case of turkish credit card data. European Journal of Operational Research. 2012;222:168-178.

Al-Kassar T, Soileau J. Financial performance evaluation and bankruptcy prediction (failure). Arab Economics and Business Journal. 2014;9:147-155.

Jones S, Hensher D. Predicting firm financial distress: A mixed logit model. The Accounting Review. 2004;79(4):1011-1038.

Premachandra IM, Bhabra GS, Sueyoshi T. DEA as a tool for bankruptcy assessment: A comparative study with logistic regression technique. European Journal of Operational Research. 2009;193(2):412-424.

Mckee TE, Lensberg T. Genetic programming and rough sets: A hybrid approach to bankruptcy classification. European Journal of Operational Research. 2002;138:436-51.

Akgül A. A new method for approximate solutions of fractional order boundary value problems. Neural, Parallel & Scientific Computations. 2014a;22(1-2):223-237.

Akgül A. Approximate solutions for MHD squeezing fluid flow by a novel method. Boundary Value Problems. 2014(1):18.

Chen Y, Cheng C. Hybrid models based on rough set classifiers for setting credit rating decision rules in the global banking industry. Knowledge-Based Systems. 2013;39(1):224-239.

Banasik J, Crook JN, Thomas LC. Not If but when will borrowers default. Journal of the Operational Research Society. 1999;50(12):1185-1190.

Glenon DC, Nigro P. Measuring the default risk of small business loans: A survival analysis approach. Journal of Money, Credit, and Banking. 2005;37(5):923-947.

Bellotti T, Crook J. Credit scoring with macro- economic variables using survival analysis. Journal of the Operational Research Society. 2009;60(12):89-99.

Cao R, Vilar JM, Devia A. Modelling consumer credit risk via survival analysis. SORT. 2009;33(1):187-220. Central Bank of Kenya. Bank Supervision Annual Report. Nairobi Kenya. Act Press; 2016.

Dirick L, Claeskens G, Baesens B. Time to default in credit scoring using survival analysis: A benchmark study. Journal of the Operational Research Society. 2017;68(6):652-665.

Stepanova M, Thomas L. Survival analysis methods for personal loan data. Operations Research. 2002;50(2):277-289.

Tong ENC, Mues C, Thomas L. Mixture cure models in credit scoring: If and when borrowers default. European Journal of Operational Research. 2012;218(1):132-139.

Zhou Hui, Trevor Hastie. Regulation and variable selection. Via the Elastic Statistical net. Journal of the Royal Society. 2005;67(2):301-320.

Zhang X, Houzelot V, Bani A, Morel JL, Echevarria G, Simonnot MO. Selection and combustion of ni-hyperaccumulators for the phytomining process. Int. J. Phytoremediat. 2014;16:1058-1072.

Boutarfa B, Akgül A, Inc M. New approach for the Fornberg–Whitham type equations. Journal of Computational and Applied Mathematics. 2017;312:13-26.

Arisawa M, Watada J. Enhanced learning in neural networks and it application to financial statement analysis. Paper Presented at IEEE International Conference on Neutral Networks; 1994.

Vapnik N. Support-vector networks. Machine Learning. 1995;20(3):273–297.
DOI: 10.1007/BF0099401

Vapnik VN. Statistical Learning Theory, New York; Wiley West, D. (2000) Neural Network Credit Scoring. Computer & Operations Research. 1998;27(11):1131-1152.

Zhou L, Lai KK, Yu Lean. Least squares support vector machines ensemble models for credit scoring. Expert Systems with Applications. 2010;37:127-133.
DOI: 10.1016/j.eswa.2009.05.024

Hu YC, Ansell J. Measuring retail company performance using credit scoring techniques. European Journal of Operational Research. 2007;183:1595-1606.
DOI: 10.1016/j.ejor.2006.09.101

Tinoco MH, Wilson N. Financial distress and bankruptcy prediction among listed companies using accounting, market and macroeconomic variables. International Review of Financial Analysis. 2013;30:394-419.

Van Gestel T, Baesens B, Suykens JAK, Van del Poel D, Baestaens D, Willekens M. Bayesian kernel based classification for financial distress detection. European Journal of Operational Research. 2006;172:979-1003.

Zhong H, Miao C, Shen Z, Feng Y. Comparing the learning effectiveness of BP, ELM, I-ELM, and SVM for corporate credit ratings. Neurocomputing. 2014;128:285-295.
DOI: 10.1016/j. neucom.2013.02.054

Pinheiro J, Bates S, Deb Roy D, Sarkar R. C team R package version. 2017;3(57):1-89.

Tashakkori A, Teddie C. (Eds). The handbook of mixed methods in social and behavioural research, sage. Thousand Oaks, CA; 2003.

Mugenda A, Mugenda O. Research methods-quantitative and qualitative approaches. Nairobi. Act Press; 1999.

International Monetary Fund (2017). Global Stability Report.>2017> Documents>text

Signoriello J. Commercial Loan Practices and Operations; 2010. ISBN: 978-1-55520-134-0

Arthur O, Sheffrin M. Economics: Principles in action. Upper Saddle River, New Jersey 07458: Pearson Prentice Hall. 2003;512. ISBN: 0-13-063085-3.

Troy Segal. The perceived relevance of tax risk-management in a South African context. Meditari Accountancy Research. 2017;25(1):82-94.
DOI: 10.1108/medar-01-2016-0008.

Appiah K. Corporate failure prediction: Some empirical evidence from listed firms in Ghana. China-USA Business Review. ISSN 1403-851X. Printed by Elanders Novum; 2011.

Kono H. Is group lending a good enforcement scheme for achieving high repayment rates? Evidence from field experiments in Vietnam. Mimeo, Institute of Developing Economies, Chiba, Japan; 2006.

Morduch J. The microfinance schism. World Development. 2000;14(2) :273694.

Trautmann T, Vlahu R. Strategic loan defaults and coordination: An experimental analysis. Journal of Banking & Finance. 2013;37(3):747-760.

Woolrdge J. Regression analysis with cross sectional data. Introductory Econometrics: A Modern Approach (4th Ed.). Cengage Learning; 2003.

Dobson AJ. An introduction to generalized linear models. 2nd Ed. Boca Rayon: Chapman & Hall/CRC; 2002.