Performance Comparison of Imputation Methods for Mixed Data Missing at Random with Small and Large Sample Data Set with Different Variability

Kyei Baffour Afari *

Department of Mathematics and Statistics, East Tennessee State University, P. O. Box 70663, Johnson City, TN 37614, United States of America.

Christina Nicole Holder Lewis

ETSU Quillen College of Medicine, Office of Academic Affairs, P.O. Box 70571, Johnson City, TN 37614, United States of America.

*Author to whom correspondence should be addressed.


Abstract

One of the concerns in the field of statistics is the presence of missing data, which leads to bias in parameter estimation and inaccurate results. However, the multiple imputation procedure is a remedy for handling missing data. This study looked at the best multiple imputation methods used to handle mixed variable datasets with different sample sizes and variability along with different levels of missingness. The study employed the predictive mean matching, classification and regression trees, and the random forest imputation methods. For each dataset, the multiple regression parameter estimates for the complete datasets were compared to the multiple regression parameter estimates found with the imputed dataset. The results showed that the random forest imputation method was the best for mostly a sample of 500 irrespective of the variability. The classification and regression tree imputation methods worked best mostly on sample of 30 irrespective of the variability.

Keywords: Predictive mean matching, classification and regression tree, random forest, multiple imputation chained equation


How to Cite

Afari, Kyei Baffour, and Christina Nicole Holder Lewis. 2022. “Performance Comparison of Imputation Methods for Mixed Data Missing at Random With Small and Large Sample Data Set With Different Variability”. Asian Journal of Probability and Statistics 20 (2):16-39. https://doi.org/10.9734/ajpas/2022/v20i2416.

Downloads

Download data is not yet available.