Diagnosing Multicollinearity of Logistic Regression Model
N. A. M. R. Senaviratna *
Department of Mathematics, The Open University of Sri Lanka, Sri Lanka.
T. M. J. A. Cooray
Department of Mathematics, University of Moratuwa, Sri Lanka.
*Author to whom correspondence should be addressed.
Abstract
One of the key problems arises in binary logistic regression model is that explanatory variables being considered for the logistic regression model are highly correlated among themselves. Multicollinearity will cause unstable estimates and inaccurate variances that affects confidence intervals and hypothesis tests. Aim of this was to discuss some diagnostic measurements to detect multicollinearity namely tolerance, Variance Inflation Factor (VIF), condition index and variance proportions. The adapted diagnostics are illustrated with data based on a study of road accidents. Secondary data used from 2014 to 2016 in this study were acquired from the Traffic Police headquarters, Colombo in Sri Lanka. The response variable is accident severity that consists of two levels particularly grievous and non-grievous. Multicolinearity is identified by correlation matrix, tolerance and VIF values and confirmed by condition index and variance proportions. The range of solutions available for logistic regression such as increasing sample size, dropping one of the correlated variables and combining variables into an index. It is safely concluded that without increasing sample size, to omit one of the correlated variables can reduce multicollinearity considerably.
Keywords: Logistic regression, multicollinearity, tolerance, variance inflation factor, condition index