Predicting unsettled credit for social security organizations using data mining techniques / by Shaimaa Mohamed Mohamed Ali ; Supervision Prof. Dr. Nagy Ramadan, Dr. Abdelmoneim Helmy.

By:

Shaimaa Mohamed Mohamed Ali [preparation.]

Contributor(s):

Material type:

TextLanguage: English Summary language: English, Arabic Producer: 2025Description: 134 Leaves : illustrations ; 30 cm. + CDContent type:

text

Media type:

Unmediated

Carrier type:

volume

Other title:

” التنبؤ بالديون غير المسددة لمؤسسات التضامن الاجتماعي باستخدام تقنيات التنقيب في البيانات [Added title page title]

Subject(s):

DDC classification:

005.12

Available additional physical forms:

Issues also as CD.

Dissertation note: Thesis (M.Sc)-Cairo University, 2025. Summary: Unsettled credit is a critical issue that affects banks and organizations on a global scale. Possessing the ability to accurately predict the level of credit default provides invaluable insight into the economic state. The social security model is distinguished by its information-driven characteristic since it produces enormous volumes of accumulated records that are too large for traditional data processing methods to cope with. Data mining helps investment organizations, banks, and insurance companies discover useful patterns from customer data for credit default. In this thesis, credit default prediction was identified as the main focus of the research to help decision-makers find suitable procedures to prevent or minimize debt. This thesis presents and surveys some recent research related to credit default prediction using data mining. Moreover, the thesis provides a brief review of the social security organization in Egypt and its unresolved debt problem. The thesis contributed a unique dataset collected from the National Organization for Social Insurance (NOSI) over a span of 13 years, along with implementation details of the different phases in a proposed approach for credit default prediction using data mining. The process begins with the data extraction and preparation phase, followed by the pre-processing phase involving methods such as replacing missing values with the mean or most frequent values, standardization, and the removal of outlier/extreme values. Various resampling methods, including Random Over-sampling, Synthetic Minority Oversampling Technique, Mahalanobis Distance-based Over-sampling, Adaptive Synthetic Sampling Approach, Multi-Class Cost-Sensitive Learning, and Similarity Oversampling and Undersampling Pre-processing, were then applied. The subsequent phase involved the application of the unsupervised algorithm k-means, followed by the implementation of tree-based supervised algorithms, including Decision Tree, Random Forest, and eXtreme Gradient Boosting. The dataset was divided into 70% for training and 30% for testing to assess the approach’s' performance. The experiment illustrated many valuable results. In resampling, this research presents the comparison results between the aforementioned seven resampling methods. The overall results showed that Similarity Oversampling and Undersampling Pre-processing is the most successful resampling method. In classification, the comparison results between the three supervised learning algorithms indicate that Random Forest is the most III | P a g e powerful algorithm. In clustering, the research proved that a hybrid supervised and unsupervised model leads to improved results. The hybrid K-means, Similarity Oversampling and Undersampling Pre-processing, and Random Forest constitute the most powerful model with a high accuracy of 74.27%, an F1 score of 0.74, and a precision of 0.7381, compared to the original dataset's accuracy of 50%, an F1 score of 0.5039, and a precision of 0.5042. Furthermore, the proposed approach has been applied to a publicly available dataset and subsequently compared with prior research, revealing the success of the proposed approach. Consequently, this research verified the added value of the proposed approach in predicting credit defaults through data mining. It could serve as an effective starting point to assist decision-makers and actuaries in making informed decisions to prevent or minimize debt. Summary: تشكل الديون المتراكمة قضية حرجة تؤثر في البنوك والمؤسسات على نطاق عالمي. إن امتلاك القدرة على التنبؤ بدقة بمستوى تعثر السداد يوفر رؤية قيمة حول الحالة الاقتصادية. يتميز نموذج التضامن الاجتماعي بأنه نموذج قائم على المعلومات؛ حيث إنه ينتج كميات هائلة من السجلات المتراكمة، والتي تكون كبيرة جدًا بحيث لا يمكن لطرق معالجة البيانات التقليدية التعامل معها. يساعد استخراج البيانات المؤسسات الاستثمارية والبنوك وشركات التأمين على اكتشاف أنماط مفيدة من بيانات العملاء الخاصة بالائتمان المتعثر. في هذه الرسالة، حُدِّد التنبؤ بتعثر الائتمان كمجال تركيز رئيسي لأبحاثنا لمساعدة صانعي القرار على إيجاد إجراءات مناسبة لمنع الديون أو تقليلها. وتستعرض هذه الرسالة بعض الأبحاث الحديثة المتعلقة بالتنبؤ بتعثر الائتمان باستخدام تنقيب البيانات. بالإضافة إلى ذلك ، توضح استعراضًا موجزة للهيئة العامة للتأمينات الاجتماعية في مصر ومشكلة ديونها المتراكمة. ساهمت الرسالة بمجموعة بيانات حقيقية جُمِعَت من الهيئة القومية للتأمينات الاجتماعية عن مدة 13 عامًا مع تفاصيل تنفيذ المراحل المختلفة في نموذج مقترح للتنبؤ بتعثر الائتمان باستخدام تنقيب البيانات، والذي يبدأ بمرحلة استخراج البيانات وتحضيرها ثم مرحلة المعالجة المسبقة باستخدام بعض الأساليب مثل استبدال القيم المفقودة بالقيمة المتوسطة أو القيم الأكثر شيوعًا والتعويض وإزالة القيم المتطرفة. بعد ذلك طُبِّقَت أساليب إعادة تشكيل العينة مثل إعادة تشكيل العينة العشوائي أو الاصطناعي، القائمة على مسافة ماهالانوبيس، تعلم متعدد الفئات بحساسية التكلفة، ومعالجة مسبقة بالزيادة والنقصان الانتقائيين للعينات وفق مقياس التشابه. المرحلة التالية تطبيق خوارزمية غير الخاضعة للإشراف. وفي النهاية تطبيق خوارزميات الأشجار الخاضعة للإشراف وهي شجرة القرار والغابة العشوائية وتعزيز التدرج الشديد. قُسِّمَت مجموعة البيانات بنسبة 70% للتدريب و 30% للاختبار لقياس أداء النماذج. أظهرت التجربة العديد من النتائج القيمة، يقدم هذا البحث نتائج المقارنة بين طرق إعادة العينة المذكورة أعلاه. أظهرت النتائج العامة أن معالجة ما قبل المعالجة بإفراط وإقلال العينة بالتشابه هي أكثر طرق إعادة تشكيل العينة نجاحًا. في التصنيف، تقدم نتائج المقارنة بين خوارزميات التعلم الآلي الخاضعة للإشراف الثلاثة أن الغابة العشوائية هي الخوارزمية الأقوى. في التجميع، أثبت البحث أن النموذج المختلط بين الخاضع للإشراف وغير الخاضع للإشراف يؤدي إلى نتائج أفضل مع ومعالجة ما قبل المعالجة بإفراط وإقلال العينة بالتشابه هو النموذج الأقوى بدقة عالية تبلغ 74.27٪ ومعدل F1 يبلغ 0.74 ودقة تبلغ 0.7381 مقارنة بدقة مجموعة البيانات الأصلية البالغة 50% ومعدل F1 يبلغ 0.5039 ودقة تبلغ 0.5042. فضلا عن ذلك، جُرِّب النموذج المقترح على مجموعة بيانات متاحة للجميع ومقارنته لاحقًا بالبحوث السابقة، مما يكشف عن نجاح النموذج المقترح. ومن ثم ، فإن هذا البحث يثبت القيمة المضافة للنموذج المقترح للتنبؤ بتعثر الائتمان باستخدام تنقيب البيانات ويمكن أن يكون نقطة انطلاق فعالة لمساعدة صانعي القرار والخبراء الاكتواريين على اتخاذ قرارات فعالة لمنع الديون أو تقليلها.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Home library	Call number	Status	Barcode
Thesis	قاعة الرسائل الجامعية - الدور الاول	المكتبة المركزبة الجديدة - جامعة القاهرة	Cai01.18.07.MSc.2025.Sh.P (Browse shelf(Opens below))	Not for loan	01010110093537000

Thesis (M.Sc)-Cairo University, 2025.

Bibliography: pages 118 -127.

Unsettled credit is a critical issue that affects banks and organizations on a global
scale. Possessing the ability to accurately predict the level of credit default provides
invaluable insight into the economic state. The social security model is distinguished by
its information-driven characteristic since it produces enormous volumes of accumulated
records that are too large for traditional data processing methods to cope with. Data
mining helps investment organizations, banks, and insurance companies discover useful
patterns from customer data for credit default. In this thesis, credit default prediction was
identified as the main focus of the research to help decision-makers find suitable
procedures to prevent or minimize debt.
This thesis presents and surveys some recent research related to credit default
prediction using data mining. Moreover, the thesis provides a brief review of the social
security organization in Egypt and its unresolved debt problem.
The thesis contributed a unique dataset collected from the National Organization for
Social Insurance (NOSI) over a span of 13 years, along with implementation details of
the different phases in a proposed approach for credit default prediction using data
mining. The process begins with the data extraction and preparation phase, followed by
the pre-processing phase involving methods such as replacing missing values with the
mean or most frequent values, standardization, and the removal of outlier/extreme values.
Various resampling methods, including Random Over-sampling, Synthetic Minority
Oversampling Technique, Mahalanobis Distance-based Over-sampling, Adaptive
Synthetic Sampling Approach, Multi-Class Cost-Sensitive Learning, and Similarity
Oversampling and Undersampling Pre-processing, were then applied.
The subsequent phase involved the application of the unsupervised algorithm k-means,
followed by the implementation of tree-based supervised algorithms, including Decision
Tree, Random Forest, and eXtreme Gradient Boosting. The dataset was divided into 70%
for training and 30% for testing to assess the approach’s' performance.
The experiment illustrated many valuable results. In resampling, this research presents
the comparison results between the aforementioned seven resampling methods. The
overall results showed that Similarity Oversampling and Undersampling Pre-processing
is the most successful resampling method. In classification, the comparison results
between the three supervised learning algorithms indicate that Random Forest is the most

III | P a g e

powerful algorithm. In clustering, the research proved that a hybrid supervised and
unsupervised model leads to improved results. The hybrid K-means, Similarity
Oversampling and Undersampling Pre-processing, and Random Forest constitute the
most powerful model with a high accuracy of 74.27%, an F1 score of 0.74, and a
precision of 0.7381, compared to the original dataset's accuracy of 50%, an F1 score of
0.5039, and a precision of 0.5042. Furthermore, the proposed approach has been applied
to a publicly available dataset and subsequently compared with prior research, revealing
the success of the proposed approach.
Consequently, this research verified the added value of the proposed approach in
predicting credit defaults through data mining. It could serve as an effective starting point
to assist decision-makers and actuaries in making informed decisions to prevent or
minimize debt.

تشكل الديون المتراكمة قضية حرجة تؤثر في البنوك والمؤسسات على نطاق عالمي. إن امتلاك القدرة على التنبؤ بدقة بمستوى تعثر السداد يوفر رؤية قيمة حول الحالة الاقتصادية. يتميز نموذج التضامن الاجتماعي بأنه نموذج قائم على المعلومات؛ حيث إنه ينتج كميات هائلة من السجلات المتراكمة، والتي تكون كبيرة جدًا بحيث لا يمكن لطرق معالجة البيانات التقليدية التعامل معها. يساعد استخراج البيانات المؤسسات الاستثمارية والبنوك وشركات التأمين على اكتشاف أنماط مفيدة من بيانات العملاء الخاصة بالائتمان المتعثر. في هذه الرسالة، حُدِّد التنبؤ بتعثر الائتمان كمجال تركيز رئيسي لأبحاثنا لمساعدة صانعي القرار على إيجاد إجراءات مناسبة لمنع الديون أو تقليلها.
وتستعرض هذه الرسالة بعض الأبحاث الحديثة المتعلقة بالتنبؤ بتعثر الائتمان باستخدام تنقيب البيانات. بالإضافة إلى ذلك ، توضح استعراضًا موجزة للهيئة العامة للتأمينات الاجتماعية في مصر ومشكلة ديونها المتراكمة.
ساهمت الرسالة بمجموعة بيانات حقيقية جُمِعَت من الهيئة القومية للتأمينات الاجتماعية عن مدة 13 عامًا مع تفاصيل تنفيذ المراحل المختلفة في نموذج مقترح للتنبؤ بتعثر الائتمان باستخدام تنقيب البيانات، والذي يبدأ بمرحلة استخراج البيانات وتحضيرها ثم مرحلة المعالجة المسبقة باستخدام بعض الأساليب مثل استبدال القيم المفقودة بالقيمة المتوسطة أو القيم الأكثر شيوعًا والتعويض وإزالة القيم المتطرفة. بعد ذلك طُبِّقَت أساليب إعادة تشكيل العينة مثل إعادة تشكيل العينة العشوائي أو الاصطناعي، القائمة على مسافة ماهالانوبيس، تعلم متعدد الفئات بحساسية التكلفة، ومعالجة مسبقة بالزيادة والنقصان الانتقائيين للعينات وفق مقياس التشابه. المرحلة التالية تطبيق خوارزمية غير الخاضعة للإشراف. وفي النهاية تطبيق خوارزميات الأشجار الخاضعة للإشراف وهي شجرة القرار والغابة العشوائية وتعزيز التدرج الشديد. قُسِّمَت مجموعة البيانات بنسبة 70% للتدريب و 30% للاختبار لقياس أداء النماذج.
أظهرت التجربة العديد من النتائج القيمة، يقدم هذا البحث نتائج المقارنة بين طرق إعادة العينة المذكورة أعلاه. أظهرت النتائج العامة أن معالجة ما قبل المعالجة بإفراط وإقلال العينة بالتشابه هي أكثر طرق إعادة تشكيل العينة نجاحًا. في التصنيف، تقدم نتائج المقارنة بين خوارزميات التعلم الآلي الخاضعة للإشراف الثلاثة أن الغابة العشوائية هي الخوارزمية الأقوى. في التجميع، أثبت البحث أن النموذج المختلط بين الخاضع للإشراف وغير الخاضع للإشراف يؤدي إلى نتائج أفضل مع ومعالجة ما قبل المعالجة بإفراط وإقلال العينة بالتشابه هو النموذج الأقوى بدقة عالية تبلغ 74.27٪ ومعدل F1 يبلغ 0.74 ودقة تبلغ 0.7381 مقارنة بدقة مجموعة البيانات الأصلية البالغة 50% ومعدل F1 يبلغ 0.5039 ودقة تبلغ 0.5042. فضلا عن ذلك، جُرِّب النموذج المقترح على مجموعة بيانات متاحة للجميع ومقارنته لاحقًا بالبحوث السابقة، مما يكشف عن نجاح النموذج المقترح.
ومن ثم ، فإن هذا البحث يثبت القيمة المضافة للنموذج المقترح للتنبؤ بتعثر الائتمان باستخدام تنقيب البيانات ويمكن أن يكون نقطة انطلاق فعالة لمساعدة صانعي القرار والخبراء الاكتواريين على اتخاذ قرارات فعالة لمنع الديون أو تقليلها.

Issues also as CD.

Text in English and abstract in Arabic & English.

There are no comments on this title.

to post a comment.

جامعة القاهرة

المكتبة المركزية الجديدة

مكتبة جامعة القاهرة الأهلية

Predicting unsettled credit for social security organizations using data mining techniques / by Shaimaa Mohamed Mohamed Ali ; Supervision Prof. Dr. Nagy Ramadan, Dr. Abdelmoneim Helmy.