header

Data cleaning using machine learning techniques / (Record no. 82453)

MARC details
000 -LEADER
fixed length control field 03016cam a2200349 a 4500
003 - CONTROL NUMBER IDENTIFIER
control field EG-GiCUC
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20250223032825.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 211005s2021 ua d f m 000 0 eng d
040 ## - CATALOGING SOURCE
Original cataloging agency EG-GiCUC
Language of cataloging eng
Transcribing agency EG-GiCUC
041 0# - LANGUAGE CODE
Language code of text/sound track or separate title eng
049 ## - LOCAL HOLDINGS (OCLC)
Holding library Deposite
097 ## - Thesis Degree
Thesis Level Ph.D
099 ## - LOCAL FREE-TEXT CALL NUMBER (OCLC)
Classification number Cai01.20.04.Ph.D.2021.Ay.D
100 0# - MAIN ENTRY--PERSONAL NAME
Personal name Ayat Mahmoud Ahmed Mohamed
245 10 - TITLE STATEMENT
Title Data cleaning using machine learning techniques /
Statement of responsibility, etc. Ayat Mahmoud Ahmed Mohamed ; Supervised Sherif Mazen , Ayman Elkilany , Farid Ali
246 15 - VARYING FORM OF TITLE
Title proper/short title تنقية البيانات باستخدام تقنيات تعلم الآلة
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Place of publication, distribution, etc. Cairo :
Name of publisher, distributor, etc. Ayat Mahmoud Ahmed Mohamed ,
Date of publication, distribution, etc. 2021
300 ## - PHYSICAL DESCRIPTION
Extent 79 Leaves :
Other physical details charts ;
Dimensions 30cm
502 ## - DISSERTATION NOTE
Dissertation note Thesis (Ph.D.) - Cairo University - Faculty of Computers and Artificial Intelligence - Department of Information Systems
520 ## - SUMMARY, ETC.
Summary, etc. Data quality is one of the most important problems in data management, since corrupt data often leads to inaccurate data analytics results and wrong business decisions. Detecting and repairing dirty data is one of the perennial challenges in data analytics, and failure to do so can result in inaccurate analytics and unreliable decisions. In today{u2019}s era of internet, the amount of data generation is growing and increasing, some of the data related to medical, e-commerce, social networking are of great importance. But many of these datasets are imbalanced that is some records belonging to same category are very large number and some are very rare. In other words, Imbalanced class distribution is a scenario where the number of observations belonging to one class is significantly lower than those belonging to the other classes. This problem is predominant in scenarios where anomaly detection is crucial like electricity pilferage, fraudulent transactions in banks, identification of rare diseases, etc. Most of the classical methods of machine learning algorithms have demonstrated shortcomings when used with imbalanced data. Conventional machine learning algorithms do not work well for imbalanced data classification because it assumes equal costs for each class.Thus, conventional machine learning algorithms could be biased and inaccurate.This thesis explores the nature of imbalanced data classification problem, introduces a survey on existing machine learning algorithms along with suggested taxonomy for all imbalanced data learning approaches.It also introduces a comparative study between the existing machine learning algorithms with respect to some factors. Then it proposes three solutions to the challenge of imbalanced data classification
530 ## - ADDITIONAL PHYSICAL FORM AVAILABLE NOTE
Additional physical form available note Issued also as CD
653 #4 - INDEX TERM--UNCONTROLLED
Uncontrolled term Classification
653 #4 - INDEX TERM--UNCONTROLLED
Uncontrolled term Data Cleaning
653 #4 - INDEX TERM--UNCONTROLLED
Uncontrolled term Imbalanced
700 0# - ADDED ENTRY--PERSONAL NAME
Personal name Ayman Elkilany ,
Relator term
700 0# - ADDED ENTRY--PERSONAL NAME
Personal name Farid Ali ,
Relator term
700 0# - ADDED ENTRY--PERSONAL NAME
Personal name Sherif Mazen ,
Relator term
856 ## - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier <a href="http://172.23.153.220/th.pdf">http://172.23.153.220/th.pdf</a>
905 ## - LOCAL DATA ELEMENT E, LDE (RLIN)
Cataloger Nazla
Reviser Revisor
905 ## - LOCAL DATA ELEMENT E, LDE (RLIN)
Cataloger Shimaa
Reviser Cataloger
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Koha item type Thesis
Holdings
Source of classification or shelving scheme Not for loan Home library Current library Date acquired Full call number Barcode Date last seen Koha item type Copy number
Dewey Decimal Classification   المكتبة المركزبة الجديدة - جامعة القاهرة قاعة الرسائل الجامعية - الدور الاول 11.02.2024 Cai01.20.04.Ph.D.2021.Ay.D 01010110084357000 22.09.2023 Thesis  
Dewey Decimal Classification   المكتبة المركزبة الجديدة - جامعة القاهرة مخـــزن الرســائل الجـــامعية - البدروم 11.02.2024 Cai01.20.04.Ph.D.2021.Ay.D 01020110084357000 22.09.2023 CD - Rom 84357.CD