Two-sample Calibrated Imputation in Surveys: A Methodological Framework for Secondary Data Analysis
Samuel Joel Kamun
*
Department of Mathematics and Physics, Technical University of Mombasa, Kenya.
Richard Simwa
Department of Account, Finance and Economics, School of Business, KCA University, Nairobi, Kenya.
*Author to whom correspondence should be addressed.
Abstract
Nonresponse at the item level and other forms of missingness—arising from editing rejections, confidentiality requirements, or the treatment of extreme values—remain central obstacles in survey sampling and valid inference. To address these difficulties, we introduce a two-sample calibrated imputation approach that ensures consistent estimation of population and domain totals, together with their variances, while relying exclusively on variance formulae designed for complete data. Notably, the method does not depend on detailed survey design metadata or replication-based variance estimation procedures. The proposed framework combines data from the original survey with an auxiliary reference sample. This integration of information improves efficiency and reduces bias compared with methods based solely on a single survey sample. For continuous survey variables, the procedure can be carried out either through calibration-based reweighting or through imputation methods. Robust extensions are also available to limit the effect of outliers. The generality of the framework allows it to be applied in multivariate contexts, permitting the joint estimation of covariances across totals and enabling inference on contrasts such as ratios and differences. Its finite-sample behavior is examined through simulation experiments, with additional implementation guidelines provided in supplementary documentation. This approach is particularly beneficial for secondary data analysts, who can apply it to improve the accuracy and reliability of estimates derived from incomplete or combined survey data sources.
Keywords: Two-sample calibrated imputation, incomplete survey data, variance estimation, multivariate inference, robust methods