TY - BOOK AU - Moheb Mofied Ragheb Henein AU - Doaa M. Shawky , AU - Salwa K. Abdelhafiz , TI - Software defect prediction using data categorization and machine learning techniques / PY - 2020/// CY - Cairo : PB - Moheb Mofied Ragheb Henein , KW - Artificial Neural Network KW - Defect KW - Software Defect Prediction N1 - Thesis (M.Sc.) - Cairo University - Faculty of Engineering - Department of Mathematics and Physics; Issued also as CD N2 - In this thesis, two approaches are proposed to overcome two main challenges in SDP; namely the class imbalance and overlap. The first approach is Clustering-based Undersampling Artificial Neural Network (CU-ANN) that tackles the imbalance problem. The second approach is Hybrid sampling Cost- Sensitive Support Vector Machine (HCSVM), which balances the data set by undersampling the majority class instances and oversampling the minority class ones. Moreover, minority samples are categorized based on their severity, where the degree of severity is directly proportional to the number of neighbors belonging to the majority class. Taking into consideration the severity of minority samples in the learning phase alleviates the impact of class overlap. A cost-sensitive approach that assigns high misclassification costs to di cult minority samples considers these samples rather than treating them as outliers. Experiments are conducted on benchmark data sets, NASA MDP, which are the most used datasets in SDP performance evaluation UR - http://172.23.153.220/th.pdf ER -