Data cleaning using machine learning techniques / Ayat Mahmoud Ahmed Mohamed ; Supervised Sherif Mazen , Ayman Elkilany , Farid Ali

By:

Ayat Mahmoud Ahmed Mohamed

Contributor(s):

Material type: Text

TextLanguage: English Publication details: Cairo : Ayat Mahmoud Ahmed Mohamed , 2021Description: 79 Leaves : charts ; 30cmOther title:

تنقية البيانات باستخدام تقنيات تعلم الآلة [Added title page title]

Subject(s):

Online resources:

Click here to access online

Available additional physical forms:

Issued also as CD

Dissertation note: Thesis (Ph.D.) - Cairo University - Faculty of Computers and Artificial Intelligence - Department of Information Systems Summary: Data quality is one of the most important problems in data management, since corrupt data often leads to inaccurate data analytics results and wrong business decisions. Detecting and repairing dirty data is one of the perennial challenges in data analytics, and failure to do so can result in inaccurate analytics and unreliable decisions. In today{u2019}s era of internet, the amount of data generation is growing and increasing, some of the data related to medical, e-commerce, social networking are of great importance. But many of these datasets are imbalanced that is some records belonging to same category are very large number and some are very rare. In other words, Imbalanced class distribution is a scenario where the number of observations belonging to one class is significantly lower than those belonging to the other classes. This problem is predominant in scenarios where anomaly detection is crucial like electricity pilferage, fraudulent transactions in banks, identification of rare diseases, etc. Most of the classical methods of machine learning algorithms have demonstrated shortcomings when used with imbalanced data. Conventional machine learning algorithms do not work well for imbalanced data classification because it assumes equal costs for each class.Thus, conventional machine learning algorithms could be biased and inaccurate.This thesis explores the nature of imbalanced data classification problem, introduces a survey on existing machine learning algorithms along with suggested taxonomy for all imbalanced data learning approaches.It also introduces a comparative study between the existing machine learning algorithms with respect to some factors. Then it proposes three solutions to the challenge of imbalanced data classification

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Home library	Call number	Copy number	Status	Barcode
Thesis	قاعة الرسائل الجامعية - الدور الاول	المكتبة المركزبة الجديدة - جامعة القاهرة	Cai01.20.04.Ph.D.2021.Ay.D (Browse shelf(Opens below))		Not for loan	01010110084357000
CD - Rom	مخـــزن الرســائل الجـــامعية - البدروم	المكتبة المركزبة الجديدة - جامعة القاهرة	Cai01.20.04.Ph.D.2021.Ay.D (Browse shelf(Opens below))	84357.CD	Not for loan	01020110084357000

Browsing المكتبة المركزبة الجديدة - جامعة القاهرة shelves Close shelf browser (Hides shelf browser)

Previous	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	Next
Previous	Cai01.20.04.Ph.D.2020.Ha.T Toward a methodological approach for strategic resilience in enterprise architecture /	Cai01.20.04.Ph.D.2020.Mo.P Predictive queries on moving objects databases /	Cai01.20.04.Ph.D.2020.Mo.P Predictive queries on moving objects databases /	Cai01.20.04.Ph.D.2021.Ay.D Data cleaning using machine learning techniques /	Cai01.20.04.Ph.D.2021.Ay.D Data cleaning using machine learning techniques /	Cai01.20.04.Ph.D.2021.Di.F A framework for anomaly detection in internet of things /	Cai01.20.04.Ph.D.2021.Di.F A framework for anomaly detection in internet of things /	Next

Thesis (Ph.D.) - Cairo University - Faculty of Computers and Artificial Intelligence - Department of Information Systems

Data quality is one of the most important problems in data management, since corrupt data often leads to inaccurate data analytics results and wrong business decisions. Detecting and repairing dirty data is one of the perennial challenges in data analytics, and failure to do so can result in inaccurate analytics and unreliable decisions. In today{u2019}s era of internet, the amount of data generation is growing and increasing, some of the data related to medical, e-commerce, social networking are of great importance. But many of these datasets are imbalanced that is some records belonging to same category are very large number and some are very rare. In other words, Imbalanced class distribution is a scenario where the number of observations belonging to one class is significantly lower than those belonging to the other classes. This problem is predominant in scenarios where anomaly detection is crucial like electricity pilferage, fraudulent transactions in banks, identification of rare diseases, etc. Most of the classical methods of machine learning algorithms have demonstrated shortcomings when used with imbalanced data. Conventional machine learning algorithms do not work well for imbalanced data classification because it assumes equal costs for each class.Thus, conventional machine learning algorithms could be biased and inaccurate.This thesis explores the nature of imbalanced data classification problem, introduces a survey on existing machine learning algorithms along with suggested taxonomy for all imbalanced data learning approaches.It also introduces a comparative study between the existing machine learning algorithms with respect to some factors. Then it proposes three solutions to the challenge of imbalanced data classification

Issued also as CD

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer