Software defect prediction using data categorization and machine learning techniques /
Moheb Mofied Ragheb Henein
Software defect prediction using data categorization and machine learning techniques / التنبؤ بعيوب البرامج عن طريق استخدام طرق تصنيف البيانات وتقنيات التعلم الآلى Moheb Mofied Ragheb Henein ; Supervised Salwa K. Abdelhafiz , Doaa M. Shawky - Cairo : Moheb Mofied Ragheb Henein , 2020 - 81 P . : charts ; 30cm
Thesis (M.Sc.) - Cairo University - Faculty of Engineering - Department of Mathematics and Physics
In this thesis, two approaches are proposed to overcome two main challenges in SDP; namely the class imbalance and overlap. The first approach is Clustering-based Undersampling Artificial Neural Network (CU-ANN) that tackles the imbalance problem. The second approach is Hybrid sampling Cost- Sensitive Support Vector Machine (HCSVM), which balances the data set by undersampling the majority class instances and oversampling the minority class ones. Moreover, minority samples are categorized based on their severity, where the degree of severity is directly proportional to the number of neighbors belonging to the majority class. Taking into consideration the severity of minority samples in the learning phase alleviates the impact of class overlap. A cost-sensitive approach that assigns high misclassification costs to di cult minority samples considers these samples rather than treating them as outliers. Experiments are conducted on benchmark data sets, NASA MDP, which are the most used datasets in SDP performance evaluation
Artificial Neural Network Defect Software Defect Prediction
Software defect prediction using data categorization and machine learning techniques / التنبؤ بعيوب البرامج عن طريق استخدام طرق تصنيف البيانات وتقنيات التعلم الآلى Moheb Mofied Ragheb Henein ; Supervised Salwa K. Abdelhafiz , Doaa M. Shawky - Cairo : Moheb Mofied Ragheb Henein , 2020 - 81 P . : charts ; 30cm
Thesis (M.Sc.) - Cairo University - Faculty of Engineering - Department of Mathematics and Physics
In this thesis, two approaches are proposed to overcome two main challenges in SDP; namely the class imbalance and overlap. The first approach is Clustering-based Undersampling Artificial Neural Network (CU-ANN) that tackles the imbalance problem. The second approach is Hybrid sampling Cost- Sensitive Support Vector Machine (HCSVM), which balances the data set by undersampling the majority class instances and oversampling the minority class ones. Moreover, minority samples are categorized based on their severity, where the degree of severity is directly proportional to the number of neighbors belonging to the majority class. Taking into consideration the severity of minority samples in the learning phase alleviates the impact of class overlap. A cost-sensitive approach that assigns high misclassification costs to di cult minority samples considers these samples rather than treating them as outliers. Experiments are conducted on benchmark data sets, NASA MDP, which are the most used datasets in SDP performance evaluation
Artificial Neural Network Defect Software Defect Prediction