Mai Ahmed Mohsen Moustafa

Handling mixed missing data / التعامل مع البيانات المفقودة المختلطة Mai Ahmed Mohsen Moustafa ; Supervised Amany Mousa Mohamed , Yasmin Mohamed Ibrahim - Cairo : Mai Ahmed Mohsen Moustafa , 2018 - 154 Leaves : charts , facsimiles ; 30cm

Thesis (M.Sc.) - Cairo University - Institute of Statistical Studies and Research - Department of Statistics and Econometrics

Incomplete data is often an unavoidable problem faced by most applied researchers as survey results often include some non-response. Various techniques have been developed for dealing with missing values in data sets with homogeneous attributes (their independent attributes are all either continuous or discrete). However, these imputation algorithms cannot be directly applied to many real data sets, as survey data sets in general often consist of large numbers of variables which have mixed data types i.e. different measurement scales. Specific methods and modification in existing methods are found for dealing with such kind of data. This thesis reviews some methods for such kind of data and applies six imputation methods out of them. Assessing the performance of the six imputation methods which are MICE, MICE-CART, MICE-RF, MissForest, MissRanger and KNN is performed using 3 real datasets at 5 different missing rates. Complete datasets have been used and variables were artificially made 2missing at random3and results were assessed using different criteria. Across the imputed datasets MissForest and MissRanger tend to have the best results while MICE-RF and KNN tend to have the worst results



MICE MICE-CART MICE-RF