Panel data analysis using supervised machine learning techniques / (Record no. 178988)

MARC details
000 -LEADER
fixed length control field 07796namaa22004331i 4500
003 - CONTROL NUMBER IDENTIFIER
control field EG-GICUC
005 - أخر تعامل مع التسجيلة
control field 20260312145719.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 260312s2025 ua a|||frm||| 000 0 eng d
040 ## - CATALOGING SOURCE
Original cataloguing agency EG-GICUC
Language of cataloging eng
Transcribing agency EG-GICUC
Modifying agency EG-GICUC
Description conventions rda
041 0# - LANGUAGE CODE
Language code of text/sound track or separate title eng
Language code of summary or abstract eng
-- ara
049 ## - Acquisition Source
Acquisition Source Deposit
082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 006.31
092 ## - LOCALLY ASSIGNED DEWEY CALL NUMBER (OCLC)
Classification number 006.31
Edition number 21
097 ## - Degree
Degree M.Sc
099 ## - LOCAL FREE-TEXT CALL NUMBER (OCLC)
Local Call Number Cai01.18.04.M.Sc.2025.Om.P
100 0# - MAIN ENTRY--PERSONAL NAME
Authority record control number or standard number Omar Ahmed Mohamed Ahmed Afifi,
Preparation preparation.
245 10 - TITLE STATEMENT
Title Panel data analysis using supervised machine learning techniques /
Statement of responsibility, etc. by Omar Ahmed Mohamed Ahmed Afifi ; Supervised Prof. Salah Mahdy Ramadan, Dr. Amal Mohamed Abdel Fatah.
246 15 - VARYING FORM OF TITLE
Title proper/short title تحليل بيانات القطاع باستخدام تقنيات التعلم الآلي الخاضعة للإشراف
264 #0 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE
Date of production, publication, distribution, manufacture, or copyright notice 2025.
300 ## - PHYSICAL DESCRIPTION
Extent 72 Leaves :
Other physical details illustrations ;
Dimensions 30 cm. +
Accompanying material CD.
336 ## - CONTENT TYPE
Content type term text
Source rda content
337 ## - MEDIA TYPE
Media type term Unmediated
Source rdamedia
338 ## - CARRIER TYPE
Carrier type term volume
Source rdacarrier
502 ## - DISSERTATION NOTE
Dissertation note Thesis (M.Sc)-Cairo University, 2025.
504 ## - BIBLIOGRAPHY, ETC. NOTE
Bibliography, etc. note Bibliography: pages 64 -69.
520 #3 - SUMMARY, ETC.
Summary, etc. Panel data analysis allows researchers to achieve greater statistical validity in policy <br/>analysis and program evaluation through more advanced research designs than cross-sectional data <br/>models. Panel (or longitudinal) data refers to data collected from the same individuals across <br/>multiple time periods. This data type consists of repeated time-series observations (𝑇) for a <br/>significant number of cross-sectional units (𝑁), such as countries, companies, randomly chosen <br/>individuals, etc. <br/><br/>This thesis discusses a comparison between the three conventional models of panel data, <br/>referred to as statistical panel models (Pooled OLS, Fixed Effects, and Random Effects), and three <br/>of the supervised machine learning techniques (Support Vector Regression, Random Forest <br/>Regressor, and Gradient Boosting Regressor) that have been used in literature to model panel data. <br/>The comparison is done in terms of prediction performance by fitting each of the six models and <br/>calculating diagnostic metrics (MSE, Bias, AIC, and BIC), then comparing the different values of <br/>the models. <br/><br/>The first comparison is an empirical study that investigates the impact of education and <br/>experience on individual wages using panel data from Greene (2008). This dataset was analyzed <br/>using the six models: three classical statistical panel data models (POLS, FE, RE) and three <br/>supervised machine learning techniques (SVR, RFR, GBR). The empirical results show that the <br/>machine learning techniques outperform the statistical models across all evaluation metrics, <br/>including Mean Squared Error (MSE), Bias, Akaike Information Criterion (AIC), and Bayesian <br/>Information Criterion (BIC). Among the machine learning techniques, Gradient Boosting and <br/>Support Vector Regression achieve the most accurate and efficient fits. The statistical models <br/>exhibit relatively higher error and complexity, with the Fixed Effects model performing the worst <br/>due to its exclusion of important time-invariant regressors. <br/>The second comparison is based on a controlled simulation study using an assumed true <br/>data-generating process (DGP), evaluated across 16 combinations of cross-sectional units (𝑁 =<br/>10,50,100,200) and time periods (𝑇 = 10,50,100,200). Each scenario was simulated over 1000 <br/>iterations to obtain stable average metrics. The findings reveal that statistical panel data models – particularly Pooled OLS and Random Effects – consistently achieve near-zero bias across all <br/>configurations, while Fixed Effects suffers from persistent bias due to model misspecification. <br/>Meanwhile, machine learning techniques demonstrate superior performance in terms of predictive <br/>performance, achieving substantially lower Mean Squared Error (MSE), AIC, and BIC values, <br/>especially as the panel size increases. Among the ML models, Gradient Boosting consistently <br/>provides the most accurate and well-balanced results, highlighting its strength in capturing <br/>complex relationships in data rich panel structures. <br/><br/>The final part of the thesis recommends, for future work, exploring machine learning <br/>techniques other than the three used, introducing more values of 𝑁 and 𝑇 for simulation, doing <br/>simulation on different panel data settings (Unbalanced, Dynamic, etc.), and doing the simulation <br/>using different DGPs to determine whether the comparison results will change.
520 #3 - SUMMARY, ETC.
Summary, etc. في هذه الرسالة تمت المقارنة بين ثلاث طرق تقليدية لتقدير بيانات البانل، تُعرف باسم نماذج البانل الإحصائية (الانحدار الخطي المجمّع، نموذج التأثيرات الثابتة، ونموذج التأثيرات العشوائية)، وثلاثة من تقنيات التعلم الآلي الخاضع للإشراف (انحدار المتجهات الداعمة، انحدار الغابة العشوائية، وانحدار التعزيز الاشتقاقي) التي استُخدمت في الأدبيات لنمذجة بيانات البانل. أُجريت المقارنة من حيث دقة التقدير عن طريق تركيب كل نموذج من النماذج الستة وحساب بعض المقاييس التشخيصية (مثل متوسط الخطأ التربيعي، مقياس التحيز، معيار أكايكي، ومعيار بيز)، ثم مقارنة القيم المختلفة للنماذج. تم إجراء المقارنة الأولى باستخدام مثال من بيانات حقيقية حول تأثير سنوات الخبرة والتعليم على أجور الأفراد العاملين. أظهرت النتائج التطبيقية أن تقنيات التعلم الآلي الثلاثة تفوقت بوضوح على نماذج البانل الكلاسيكية في جميع المقاييس التشخيصية. تمت المقارنة الثانية باستخدام بيانات محاكاة في 16 تركيبة مختلفة تم تنفيذ كل تجربة محاكاة على 1000 تكرار لضمان الاستقرار في حساب المتوسطات الإحصائية للمقاييس التشخيصية. أظهرت تقنيات التعلم الآلي تحيزًا أعلى في العينات الصغيرة، لكنه ينخفض بشكل ملحوظ مع زيادة حجم البيانات، حيث حقق انحدار التعزيز الاشتقاقي أفضل أداء من حيث تقليل التحيز عند أكبر أحجام العينة. كما تفوقت تقنيات التعلم الآلي على النماذج الكلاسيكية في متوسط الخطأ التربيعي، ومعياري أكايكي وبيز، خاصة عند تكبير حجم البيانات.
530 ## - ADDITIONAL PHYSICAL FORM AVAILABLE NOTE
Issues CD Issues also as CD.
546 ## - LANGUAGE NOTE
Text Language Text in English and abstract in Arabic & English.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Machine learning
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element التعلم الآلي
653 #1 - INDEX TERM--UNCONTROLLED
Uncontrolled term Panel Data Analysis
-- Statistical Panel Models
-- Pooled OLS
-- Fixed Effects
-- Random Effects
-- Panel Data using Machine Learning
-- Support Vector Regression (SVR)
-- Random Forest Regressor (RFR)
-- Gradient Boosting Regressor (GBR)
-- Panel Data Simulation
-- تحليل بيانات البانل
-- نماذج البانل الإحصائية
700 0# - ADDED ENTRY--PERSONAL NAME
Personal name Salah Mahdy Ramadan
Relator term thesis advisor.
700 0# - ADDED ENTRY--PERSONAL NAME
Personal name Amal Mohamed Abdel Fatah
Relator term thesis advisor.
900 ## - Thesis Information
Grant date 01-01-2025
Supervisory body Salah Mahdy Ramadan
-- Amal Mohamed Abdel Fatah
Universities Cairo University
Faculties Faculty of Graduate Studies for Statistical Research
Department Department of Applied Statistics and Econometrics
905 ## - Cataloger and Reviser Names
Cataloger Name Shimaa
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Koha item type Thesis
Edition 21
Suppress in OPAC No
Holdings
Source of classification or shelving scheme Home library Current library Date acquired Inventory number Full call number Barcode Date last seen Effective from Koha item type
Dewey Decimal Classification المكتبة المركزبة الجديدة - جامعة القاهرة قاعة الرسائل الجامعية - الدور الاول 12.03.2026 93534 Cai01.18.04.M.Sc.2025.Om.P 01010110093534000 12.03.2026 12.03.2026 Thesis
Cairo University Libraries Portal Implemented & Customized by: Eng. M. Mohamady Contacts: new-lib@cl.cu.edu.eg | cnul@cl.cu.edu.eg
CUCL logo CNUL logo
© All rights reserved — Cairo University Libraries
CUCL logo
Implemented & Customized by: Eng. M. Mohamady Contact: new-lib@cl.cu.edu.eg © All rights reserved — New Central Library
CNUL logo
Implemented & Customized by: Eng. M. Mohamady Contact: cnul@cl.cu.edu.eg © All rights reserved — Cairo National University Library