000 | 03016cam a2200349 a 4500 | ||
---|---|---|---|
003 | EG-GiCUC | ||
005 | 20250223032825.0 | ||
008 | 211005s2021 ua d f m 000 0 eng d | ||
040 |
_aEG-GiCUC _beng _cEG-GiCUC |
||
041 | 0 | _aeng | |
049 | _aDeposite | ||
097 | _aPh.D | ||
099 | _aCai01.20.04.Ph.D.2021.Ay.D | ||
100 | 0 | _aAyat Mahmoud Ahmed Mohamed | |
245 | 1 | 0 |
_aData cleaning using machine learning techniques / _cAyat Mahmoud Ahmed Mohamed ; Supervised Sherif Mazen , Ayman Elkilany , Farid Ali |
246 | 1 | 5 | _aتنقية البيانات باستخدام تقنيات تعلم الآلة |
260 |
_aCairo : _bAyat Mahmoud Ahmed Mohamed , _c2021 |
||
300 |
_a79 Leaves : _bcharts ; _c30cm |
||
502 | _aThesis (Ph.D.) - Cairo University - Faculty of Computers and Artificial Intelligence - Department of Information Systems | ||
520 | _aData quality is one of the most important problems in data management, since corrupt data often leads to inaccurate data analytics results and wrong business decisions. Detecting and repairing dirty data is one of the perennial challenges in data analytics, and failure to do so can result in inaccurate analytics and unreliable decisions. In today{u2019}s era of internet, the amount of data generation is growing and increasing, some of the data related to medical, e-commerce, social networking are of great importance. But many of these datasets are imbalanced that is some records belonging to same category are very large number and some are very rare. In other words, Imbalanced class distribution is a scenario where the number of observations belonging to one class is significantly lower than those belonging to the other classes. This problem is predominant in scenarios where anomaly detection is crucial like electricity pilferage, fraudulent transactions in banks, identification of rare diseases, etc. Most of the classical methods of machine learning algorithms have demonstrated shortcomings when used with imbalanced data. Conventional machine learning algorithms do not work well for imbalanced data classification because it assumes equal costs for each class.Thus, conventional machine learning algorithms could be biased and inaccurate.This thesis explores the nature of imbalanced data classification problem, introduces a survey on existing machine learning algorithms along with suggested taxonomy for all imbalanced data learning approaches.It also introduces a comparative study between the existing machine learning algorithms with respect to some factors. Then it proposes three solutions to the challenge of imbalanced data classification | ||
530 | _aIssued also as CD | ||
653 | 4 | _aClassification | |
653 | 4 | _aData Cleaning | |
653 | 4 | _aImbalanced | |
700 | 0 |
_aAyman Elkilany , _eSupervisor |
|
700 | 0 |
_aFarid Ali , _eSupervisor |
|
700 | 0 |
_aSherif Mazen , _eSupervisor |
|
856 | _uhttp://172.23.153.220/th.pdf | ||
905 |
_aNazla _eRevisor |
||
905 |
_aShimaa _eCataloger |
||
942 |
_2ddc _cTH |
||
999 |
_c82453 _d82453 |