000 03016cam a2200349 a 4500
003 EG-GiCUC
005 20250223032825.0
008 211005s2021 ua d f m 000 0 eng d
040 _aEG-GiCUC
_beng
_cEG-GiCUC
041 0 _aeng
049 _aDeposite
097 _aPh.D
099 _aCai01.20.04.Ph.D.2021.Ay.D
100 0 _aAyat Mahmoud Ahmed Mohamed
245 1 0 _aData cleaning using machine learning techniques /
_cAyat Mahmoud Ahmed Mohamed ; Supervised Sherif Mazen , Ayman Elkilany , Farid Ali
246 1 5 _aتنقية البيانات باستخدام تقنيات تعلم الآلة
260 _aCairo :
_bAyat Mahmoud Ahmed Mohamed ,
_c2021
300 _a79 Leaves :
_bcharts ;
_c30cm
502 _aThesis (Ph.D.) - Cairo University - Faculty of Computers and Artificial Intelligence - Department of Information Systems
520 _aData quality is one of the most important problems in data management, since corrupt data often leads to inaccurate data analytics results and wrong business decisions. Detecting and repairing dirty data is one of the perennial challenges in data analytics, and failure to do so can result in inaccurate analytics and unreliable decisions. In today{u2019}s era of internet, the amount of data generation is growing and increasing, some of the data related to medical, e-commerce, social networking are of great importance. But many of these datasets are imbalanced that is some records belonging to same category are very large number and some are very rare. In other words, Imbalanced class distribution is a scenario where the number of observations belonging to one class is significantly lower than those belonging to the other classes. This problem is predominant in scenarios where anomaly detection is crucial like electricity pilferage, fraudulent transactions in banks, identification of rare diseases, etc. Most of the classical methods of machine learning algorithms have demonstrated shortcomings when used with imbalanced data. Conventional machine learning algorithms do not work well for imbalanced data classification because it assumes equal costs for each class.Thus, conventional machine learning algorithms could be biased and inaccurate.This thesis explores the nature of imbalanced data classification problem, introduces a survey on existing machine learning algorithms along with suggested taxonomy for all imbalanced data learning approaches.It also introduces a comparative study between the existing machine learning algorithms with respect to some factors. Then it proposes three solutions to the challenge of imbalanced data classification
530 _aIssued also as CD
653 4 _aClassification
653 4 _aData Cleaning
653 4 _aImbalanced
700 0 _aAyman Elkilany ,
_eSupervisor
700 0 _aFarid Ali ,
_eSupervisor
700 0 _aSherif Mazen ,
_eSupervisor
856 _uhttp://172.23.153.220/th.pdf
905 _aNazla
_eRevisor
905 _aShimaa
_eCataloger
942 _2ddc
_cTH
999 _c82453
_d82453