Open Access Original Research Article

The Principal Component Analysis Biplot Predictions versus the Ordinary Least Squares Regression Predictions: The Anthropometric Case Study

Chisimkwuo John, Chukwuemeka O. Omekara, Godwin Okwara

Asian Journal of Probability and Statistics, Page 1-10
DOI: 10.9734/ajpas/2019/v3i430098

An indicative feature of a principal component analysis (PCA) variant to the multivariate data set is the ability to transform correlated linearly dependent variables to linearly independent principal components. Back-transforming these components with the samples and variables approximated on a single calibrated plot gives rise to the PCA Biplots. In this work, the predictive property of the PCA biplot was augmented in the visualization of anthropometric measurements namely; weight (kg), height (cm), skinfold (cm), arm muscle circumference AMC (cm), mid upper arm circumference MUAC (cm) collected from the students of School of Nursing and Midwifery, Federal Medical Center (FMC), Umuahia, Nigeria. The adequacy and quality of the PCA Biplot was calculated and the predicted samples are then compared with the ordinary least square (OLS) regression predictions since both predictions makes use of an indicative minimization of the error sum of squares. The result suggests that the PCA biplot prediction merits further consideration when handling correlated multivariate data sets as its predictions with mean square error (MSE) of 0.00149 seems to be better when compared to the OLS regression predictions with MSE of 29.452.

Open Access Original Research Article

Comparative Study of Failure Rate of Bank’s ATM: Log Normal Distribution Approach

Orumie, Ukamaka Cynthia, E. O. Biu

Asian Journal of Probability and Statistics, Page 1-19
DOI: 10.9734/ajpas/2019/v3i430099

This research determined time to failure rate and number of successful transaction of selected banks in Nigeria, using Log normal distribution. Transformation technique was applied to the log-normal model to obtain a quadratic equation or polynomial regression that assisted in determining the parameters of the log-normal model. In addition, one-way ANOVA was used to test for equality of the average (or mean) time to failure rate and average number of successful service time of the banks. The research fitted the log-normal models of the banks with the help of SPSS 21 statistical software and the result showed that GT-Bank model has the highest variation of 90.3% for number of successful service time (t), while Fidelity bank model has the highest variation of 56.6% for time of failure rate. The one-way ANOVA result of the number of successful service time (min) showed a significant difference. The Tukey comparison tests showed that GT bank is significant at 5% and 10% from other banks. Hence, the number of successful service time (min) were not the same for all the five banks. However, the one-way ANOVA result of the banks in term of number of Time to Failure (t) (min) showed no significant difference among the five banks.

Open Access Original Research Article

Analysis of Individual Loan Defaults Using Logit under Supervised Machine Learning Approach

Dominic M. Obare, Gladys G. Njoroge, Moses M. Muraya

Asian Journal of Probability and Statistics, Page 1-12
DOI: 10.9734/ajpas/2019/v3i430100

Financial institutions have a large amount of data on their borrowers, which can be used to predict the probability of borrowers defaulting their loan or not. Some of the models that have been used to predict individual loan defaults include linear discriminant analysis models and extreme value theory models. These models are parametric in nature since they assume that the response being investigated takes a particular functional form. However, there is a possibility that the functional form used to estimate the response is very different from the actual functional form of the response. The purpose of this research was to analyze individual loan defaults in Kenya using the logistic regression model. The data used in this study was obtained from equity bank of Kenya for the period between 2006 to 2016. A random sample of 1000 loan applicants whose loans had been approved by equity bank of Kenya during this period was obtained. Data obtained was on the credit history, purpose of the loan, loan amount, nature of the saving account, employment status, sex of the applicant, age of the applicant, security used when acquiring the loan and the area of residence of the applicant (rural or urban). This study employed a quantitative research design, it deals with individual loans defaults as group characteristics of a borrower. The data was pre-processed by seeding using R- Software and then split into training dataset and test data set. The train data was used to train the logistic regression model by employing Supervised machine learning approach. The R-statistical software was used for the analysis of the data. The test data set was used to do cross-validation of the developed logistic model which later was used for analysis prediction of individual loan defaults. This study focused on the analysis of individual loan defaults in Kenya using the logistic regression model in Machine learning. The logistic regression model predicted 303 defaults from train data set, 122 non-defaults and misclassified loans were 56 and 69. The model had an accuracy of 0.7727 with the train data and 0.7333 with the test data. The logistic regression model showed a precision of 0.8440 and 0.8244 with the train and test data respectively. The performance of the model with both the train and test data was illustrated using a plot of train errors and test errors against sample size on the same axes. The plot showed that the performance of the model increases with an increase in sample size. The study recommended the use of logistic regression in conjunction with supervised machine learning approach in loan default prediction in financial institutions and also more research should be carried out on ensemble methods of loan defaults prediction in order to increase the prediction accuracy.

Open Access Original Research Article

Empirical Convergence Rate of a Markov Transition Matrix

Steven T. Garren

Asian Journal of Probability and Statistics, Page 1-7
DOI: 10.9734/ajpas/2019/v3i430101

The convergence rate of a Markov transition matrix is governed by the second largest eigenvalue, where the first largest eigenvalue is unity, under general regularity conditions. Garren and Smith (2000) constructed confidence intervals on this second largest eigenvalue, based on asymptotic normality theory, and performed simulations, which were somewhat limited in scope due to the reduced computing power of that time period. Herein we focus on simulating coverage intervals, using the advanced computing power of our current time period. Thus, we compare our simulated coverage intervals to the theoretical confidence intervals from Garren and Smith (2000).

Open Access Original Research Article

Multinomial Logistic Modelling of Socio-Economic Factors Influencing Spending Behavior of University Students

Gogo Jacqueline Akelo, Stephen Muteti Mbunzi, Cyrus Gitonga Ngari

Asian Journal of Probability and Statistics, Page 1-23
DOI: 10.9734/ajpas/2019/v3i430102

This study aims at determining the use of Multinomial Logistic Regression (MLR) model which is one of the important methods for categorical data analysis. This model particularly deals with one nominal or ordinal response variable that has more than two categories. Despite the fact that many researchers have applied this model in data analysis in many areas, for instance behavioral, social, health, and educational, a study on spending habits of University students have never been done. To identify the model by practical way, we conducted a survey research among students from University of Embu. Segment of the population of students in undergraduate level, a sample of 376 was selected. We employed the use stratified random sampling and simple random sampling without replacement in each stratum. The response variable consisted of five categories. Four of explanatory variables were used for building the primary (MLR) model. The model was tested through a set of statistical tests to ensure its appropriateness for the data. From the results, the study reveals that year of study, family financial level, gender and school are significant factors in explaining spending habits of students. Despite the fact that gender is one of the deterministic factors of financial behavior of student, this model identified family level of income as a major deterministic factor. Conclusively, using MLR model accurately defines the relationship between the group of explanatory variables and the response variable. It also identifies the effect of each of the variables, and we can predict the classification of any individual case. The researchers recommend that, the Universities peer counselling department, should hold trainings on the basis of major determinant of financial spending behavior i.e. family financial level.