Normal view MARC view ISBD view

Data driven automated machine learning pipeline recommendation framework / (Record no. 178676)

MARC details
000 -LEADER
fixed length control field	12450namaa22004451i 4500
003 - CONTROL NUMBER IDENTIFIER
control field	EG-GICUC
005 - أخر تعامل مع التسجيلة
control field	20260309113159.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	260223s2025 ua a\|\|\|frm\|\|\| 000 0 eng d
040 ## - CATALOGING SOURCE
Original cataloguing agency	EG-GICUC
Language of cataloging	eng
Transcribing agency	EG-GICUC
Modifying agency	EG-GICUC
Description conventions	rda
041 0# - LANGUAGE CODE
Language code of text/sound track or separate title	eng
Language code of summary or abstract	eng
--	ara
049 ## - Acquisition Source
Acquisition Source	Deposit
082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	006.31
092 ## - LOCALLY ASSIGNED DEWEY CALL NUMBER (OCLC)
Classification number	006.31
Edition number	21
097 ## - Degree
Degree	Ph.D
099 ## - LOCAL FREE-TEXT CALL NUMBER (OCLC)
Local Call Number	Cai01.20.04.Ph.D.2025.Ib.D
100 0# - MAIN ENTRY--PERSONAL NAME
Authority record control number or standard number	Ibrahim Gomaa Ibrahim Abdelghany,
Preparation	preparation.
245 10 - TITLE STATEMENT
Title	Data driven automated machine learning pipeline recommendation framework /
Statement of responsibility, etc.	by Ibrahim Gomaa Ibrahim Abdelghany ; Supervision Prof. Dr. Hoda Mokhtar Omar Mokhtar, Prof. Dr. Neamat El-Tazi, Dr. Ali Zidane.
246 15 - VARYING FORM OF TITLE
Title proper/short title	اطار عمل توصيات آلى مدفوع بالبيانات لخط أنابيب التعلم الآلى
264 #0 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE
Date of production, publication, distribution, manufacture, or copyright notice	2025.
300 ## - PHYSICAL DESCRIPTION
Extent	101 Leaves :
Other physical details	illustrations ;
Dimensions	30 cm. +
Accompanying material	CD.
336 ## - CONTENT TYPE
Content type term	text
Source	rda content
337 ## - MEDIA TYPE
Media type term	Unmediated
Source	rdamedia
338 ## - CARRIER TYPE
Carrier type term	volume
Source	rdacarrier
502 ## - DISSERTATION NOTE
Dissertation note	Thesis (Ph.D)-Cairo University, 2025.
504 ## - BIBLIOGRAPHY, ETC. NOTE
Bibliography, etc. note	Bibliography: pages 92 -101.
520 #3 - SUMMARY, ETC.
Summary, etc.	Machine Learning (ML) and Automated Machine Learning (Auto-ML) <br/>have attracted more attention in recent years. ML pipelines include <br/>repetitive tasks such as data pre-processing, feature engineering, model <br/>selection, and hyperparameter optimization. Building a machine learning <br/>model requires extensive time for development, stress testing, and multiple <br/>experiments. Besides, building a model with a small search space of <br/>pipeline steps and multiple algorithms takes hours. Hence, Auto-ML has <br/>been widely adapted to save time and effort on such tasks. Auto-ML aims <br/>to minimize human involvement in the loop while building ML tasks. <br/>Consequently, it facilitates the development of ML for business, for ML <br/>experts, and for non-technical users. Auto-ML frameworks can be used in <br/>three broad domains: supervised learning, unsupervised learning, and deep <br/>learning. While these frameworks have shown promise, significant gaps <br/>persist, particularly in supervised and unsupervised learning contexts. This <br/>thesis addresses these limitations through novel methodological <br/>contributions and comprehensive empirical validation. <br/>In the realm of supervised learning, existing Auto-ML frameworks <br/>have many limitations. Most focus only on a part of the ML pipeline, such <br/>as hyperparameter tuning or model selection, rather than optimizing the <br/>end-to-end workflow, leading to suboptimal solutions for specific datasets. <br/>Furthermore, the absence of meta-learning integration restricts their <br/>adaptability, forcing users to initiate exhaustive pipeline searches for every <br/>new task instead of leveraging historical knowledge to derive generalized, <br/>robust solutions. Compounding these issues is the inadequate handling of <br/>class-imbalanced datasets, a prevalent challenge in real-world applications. <br/>To address these gaps, this work introduces SML-AutoML, a meta-<br/>learning-driven framework designed for automated algorithm selection and <br/>pipeline optimization. The proposed system holistically automates the <br/>supervised learning pipeline, spanning data preprocessing, feature <br/>engineering, model selection, and hyperparameter optimization, while <br/>explicitly incorporating meta-learning to transfer knowledge across tasks. <br/>Additionally, it integrates advanced resampling techniques and cost-<br/>sensitive learning to mitigate class imbalance, ensuring robust performance <br/>on skewed datasets. <br/>While Auto-ML research has predominantly focused on supervised <br/>learning, the automation of unsupervised learning, particularly clustering, <br/>remains underexplored despite its broad applicability in domains such as <br/>customer segmentation, financial analytics, and marketing strategy. Current <br/>automated clustering frameworks prioritize dataset characteristics but <br/>neglect critical factors such as algorithmic properties, computational <br/>constraints, and user-specific requirements (e.g., interpretability or <br/>scalability). To bridge this gap, we propose SOL-Auto-Clust, an end-to-end <br/>framework that automates the entire clustering pipeline. SOL-Auto-Clust <br/>synthesizes data characteristics (e.g., dimensionality, sparsity), algorithmic <br/>traits (e.g., sensitivity to noise, scalability), and user-defined objectives to <br/>recommend optimal clustering workflows. The framework automates labor-<br/>intensive tasks, including data normalization, feature transformation, cluster <br/>count estimation via novel validity metrics, and algorithm selection. It <br/>further incorporates multi-objective optimization to balance competing <br/>criteria such as cluster cohesion, runtime efficiency, and alignment with <br/>user-defined objectives. <br/>Both frameworks were rigorously evaluated on diverse open-source <br/>datasets. SML-AutoML demonstrated superior performance over state-of-<br/>the-art tools (e.g., Auto-Sklearn, TPOT) across metrics such as accuracy, <br/>precision, and recall. Similarly, SOL-Auto-Clust outperformed existing <br/>clustering Auto-ML baselines (e.g., AutoCluster, AutoClust, ML2DAC) <br/>across metrics such as silhouette score and Adjusted Rand Index (ARI).
520 #3 - SUMMARY, ETC.
Summary, etc.	حظي تعلم الآلة (ML) وتعلم الآلة الآلي (Auto-ML) باهتمام متزايد في السنوات الأخيرة. تتضمن خط أنابيب تعلم الآلة مهام متكررة مثل المعالجة المسبقة للبيانات، وهندسة الميزات، واختيار النموذج، وتحسين المعلمات الفائقة. يتطلب بناء نموذج تعلم آلة وقتًا طويلاً للتطوير واختبار الإجهاد وإجراء تجارب متعددة. بالإضافة إلى ذلك، يستغرق بناء نموذج بمساحة بحث صغيرة لخطوات خط الأنابيب وخوارزميات متعددة ساعات. وبالتالي، تم تكييف تعلم الآلة الآلي على نطاق واسع لتوفير الوقت والجهد في مثل هذه المهام. يهدف تعلم الآلة الآلي إلى تقليل التدخل البشري في الحلقة أثناء بناء مهام تعلم الآلة. ونتيجة لذلك، فإنه يسهل تطوير تعلم الآلة للأعمال التجارية، ولخبراء تعلم الآلة، وللمستخدمين غير التقنيين. يتم تصنيف أطر تعلم الآلة الآلي على نطاق واسع إلى ثلاثة مجالات: التعلم الخاضع للإشراف، والتعلم غير الخاضع للإشراف، والتعلم العميق. في حين أظهرت هذه الأطر تطوراً، لا تزال هناك فجوات كبيرة، لا سيما في سياقات التعلم الخاضع للإشراف وغير الخاضع للإشراف. تتناول هذه الرسالة هذه القيود من خلال مساهمات منهجية جديدة وتصديق تجريبي شامل.<br/>في مجال التعلم الخاضع للإشراف، تعاني أطر تعلم الآلة الآلي الحالية من العديد من القيود حيث يركز معظمها على جزء من خط أنابيب تعلم الآلة فقط، مثل ضبط المعلمات الفائقة أو اختيار النموذج، بدلاً من تحسين سير العمل الشامل، مما يؤدي إلى حلول دون المستوى الأمثل لمجموعات بيانات محددة. علاوة على ذلك، فإن غياب تكامل التعلم الفوقي يقيد قدرتها على التكيف، مما يجبر المستخدمين على بدء عمليات بحث شاملة في خط الأنابيب لكل مهمة جديدة بدلاً من الاستفادة من المعرفة التاريخية لاستخلاص حلول عامة وقوية. ومما يزيد هذه المشاكل تعقيدًا هو عدم كفاية التعامل مع مجموعات البيانات غير المتوازنة للفئات، وهو تحدٍ شائع في التطبيقات الواقعية. لمعالجة هذه الفجوات، يقدم هذا العمل SML-AutoML، وهو إطار عمل مدفوع بالتعلم الفوقي مصمم للاختيار الآلي للخوارزميات وتحسين خطوط الأنابيب. يقوم النظام المقترح بأتمه شاملة لخط أنابيب التعلم الخاضع للإشراف، الذي يمتد من المعالجة المسبقة للبيانات، وهندسة الميزات، واختيار النموذج، وتحسين المعلمات الفائقة، مع دمج التعلم الفوقي بشكل صريح لنقل المعرفة عبر المهام. بالإضافة إلى ذلك، فإنه يدمج تقنيات إعادة التشكيل المتقدمة والتعلم الحساس للتكلفة للتخفيف من عدم توازن الفئات، مما يضمن أداءً قويًا على مجموعات البيانات المنحرفة.<br/>على الرغم من أن أبحاث تعلم الآلة الآلي ركزت بشكل أساسي على التعلم الخاضع للإشراف، فإن أتمتة التعلم غير الخاضع للإشراف، وخاصة التجميع، لا تزال غير مستكشفة على الرغم من قابليتها للتطبيق على نطاق واسع في مجالات مثل تجزئة العملاء، والتحليلات المالية، واستراتيجية التسويق. تعطي أطر التجميع الآلي الحالية الأولوية لخصائص مجموعة البيانات ولكنها تهمل عوامل حاسمة مثل الخصائص الخوارزمية، والقيود الحسابية، ومتطلبات المستخدم المحددة (مثل قابلية التفسير أو قابلية التوسع). لسد هذه الفجوة، نقترح SOL-Auto-Clust، وهو إطار عمل شامل يقوم بأتمتة خط أنابيب التجميع بأكمله. يقوم SOL-Auto-Clust بتجميع خصائص البيانات (مثل الأبعاد، التناثر)، والسمات الخوارزمية (مثل الحساسية للقيم المتطرفة ، قابلية التوسع)، والأهداف المحددة من قبل المستخدم للتوصية بسير عمل تجميع مثالي. يقوم الإطار بأتمتة المهام كثيفة العمالة، بما في ذلك تطبيع البيانات، وتحويل الميزات، وتقدير عدد التجمعات عبر مقاييس صلاحية جديدة، واختيار الخوارزمية. كما أنه يشتمل على تحسين متعدد الأهداف لتحقيق التوازن بين المعايير المتنافسة مثل تماسك التجمعات، وكفاءة وقت التشغيل، والتوافق مع الأهداف المحددة من قبل المستخدم.<br/>تم تقييم كلا الإطارين بدقة على مجموعات بيانات متنوعة مفتوحة المصدر. أظهر SML-AutoML أداءً متفوقًا على الأدوات الحديثةمثل ( Auto-Sklearn، TPOT ) عبر مقاييس مثل الدقة، والدقة، والاستدعاء. وبالمثل، تفوق SOL-Auto-Clust على خطوط الأساس الحالية لتعلم الآلة الآلي للتجميعمثل ( AutoCluster، AutoClust، ML2DAC) عبر مقاييس مثل درجة الظلية ومؤشر راند المعدل. (ARI)
530 ## - ADDITIONAL PHYSICAL FORM AVAILABLE NOTE
Issues CD	Issues also as CD.
546 ## - LANGUAGE NOTE
Text Language	Text in English and abstract in Arabic & English.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	Machine learning
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	التعلم الآلى
653 #1 - INDEX TERM--UNCONTROLLED
Uncontrolled term	Automated Machine Learning (Auto-ML)
--	hyperparameter optimization (HPO)
--	Meta-learning, supervised learning
--	CASH
--	Automated clustering, unsupervised learning
--	الذكاء الاصطناعى
--	التعلم الألة
700 0# - ADDED ENTRY--PERSONAL NAME
Personal name	Hoda Mokhtar Omar Mokhtar
Relator term	thesis advisor.
700 0# - ADDED ENTRY--PERSONAL NAME
Personal name	Neamat El-Tazi
Relator term	thesis advisor.
700 0# - ADDED ENTRY--PERSONAL NAME
Personal name	Ali Zidane
Relator term	thesis advisor.
900 ## - Thesis Information
Grant date	01-01-2023
Supervisory body	Hoda Mokhtar Omar Mokhtar
--	Neamat El-Tazi
--	Ali Zidane
Universities	Cairo University
Faculties	Faculty of Computers and Artificial Intelligence
Department	Department of Information Systems
905 ## - Cataloger and Reviser Names
Cataloger Name	Shimaa
Reviser Names	Eman Ghareb
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme	Dewey Decimal Classification
Koha item type	Thesis
Edition	21
Suppress in OPAC	No

Holdings
Source of classification or shelving scheme	Home library	Current library	Date acquired	Inventory number	Full call number	Barcode	Date last seen	Effective from	Koha item type
Dewey Decimal Classification	المكتبة المركزبة الجديدة - جامعة القاهرة	قاعة الرسائل الجامعية - الدور الاول	23.02.2026	93447	Cai01.20.04.Ph.D.2025.Ib.D	01010110093447000	23.02.2026	23.02.2026	Thesis

جامعة القاهرة

المكتبة المركزية الجديدة

مكتبة جامعة القاهرة الأهلية

Data driven automated machine learning pipeline recommendation framework / (Record no. 178676)