000 07028namaa22004211i 4500
003 EG-GICUC
005 20260204102824.0
008 260124s2025 ua a|||frm||| 000 0 eng d
040 _aEG-GICUC
_beng
_cEG-GICUC
_dEG-GICUC
_erda
041 0 _aeng
_beng
_bara
049 _aDeposit
082 0 4 _a006.31
092 _a006.31
_221
097 _aM.Sc
099 _aCai01.18.11.M.Sc.2025.Sa.M
100 0 _aSamy Elsayed Teleb Hassan,
_epreparation.
245 1 0 _aMachine learning approach for detecting artificial inflation of SMS traffic /
_cby Samy Elsayed Teleb Hassan ; Supervision Prof. Dr. Ammar Mohammed Ammar Mohammed.
246 1 5 _aنهج تعلم الآلة لاكتشاف التضخم الاصطناعي لحركة الرسائل النصية
264 0 _c2025.
300 _a88 Leaves :
_billustrations ;
_c30 cm. +
_eCD.
336 _atext
_2rda content
337 _aUnmediated
_2rdamedia
338 _avolume
_2rdacarrier
502 _aThesis (M.Sc)-Cairo University, 2025.
504 _aBibliography: pages 81-88.
520 3 _aDetecting artificially inflated SMS traffic is crucial in telecommunications to maintain network integrity amid rising fraudulent activities like SMS spamming. This thesis explores machine learning techniques for identifying such anomalies, comparing four models: RandomForestClassifier, GradientBoosting, Support Vector Machine (SVM), and Naive Bayes (GaussianNB). Utilizing a dataset of 12,500 SMS instances 10,000 labeled as artificially inflated and 2,500 as normal. The study examines features including sender and recipient numbers, mobile country code (MCC), mobile network code (MNC), country, and network, all preprocessed for modeling. The primary goal is to evaluate each model's effectiveness in distinguishing artificially inflated SMS traffic from normal messages, focusing on metrics like precision, recall, and F1-scores. RandomForestClassifier achieved high accuracy with a precision of 0.91 for AIT and 1.00 for normal traffic; however, it had a lower recall for normal traffic (0.59), indicating challenges in identifying certain normal instances. The SVM model, despite its proficiency in high-dimensional spaces, showed lower precision and recall for normal traffic, suggesting difficulties in classifying non-inflated messages. GaussianNB performed poorly for normal traffic, with precision, recall, and F1-scores near zero, indicating its limitations with complex or imbalanced datasets. In contrast, GradientBoostingClassifier delivered promising results, with performance metrics comparable to RandomForestClassifier, demonstrating its effectiveness in detecting artificially inflated SMS traffic. The thesis analyzes each model's strengths and weaknesses, identifying contexts where one may be more suitable than others. RandomForestClassifier and GradientBoostingClassifier emerged as the most reliable for detecting artificial inflation, with GradientBoosting showing greater resilience with imbalanced datasets. SVM's performance was hindered by its inability to effectively identify v normal traffic in this scenario, while GaussianNB's simplifying assumptions limited its utility. This comprehensive evaluation underscores the importance of selecting appropriate machine learning models based on data characteristics and task requirements. While advanced models like RandomForest and GradientBoosting offer superior performance in this domain, simpler models such as Naive Bayes or SVM may still be applicable in specific scenarios with proper data preparation. Future research could explore other algorithms, hyperparameter optimization, and incorporation of domain-specific features to enhance detection accuracy. In conclusion, the study confirms that machine learning effectively identifies artificially inflated SMS traffic, providing telecom operators and fraud detection systems with viable solutions to mitigate SMS-based fraud risks. Comparative analysis offers valuable insights for selecting suitable techniques based on data availability, computational resources, and specific detection system objectives. .
520 3 _aيُعد تضخم حركة الرسائل النصية القصيرة (SMS) بشكلٍ اصطناعي من أخطر التحديات التي تواجه شركات الاتصالات، لما له من آثار مالية وتشغيلية سلبية. تهدف هذه الدراسة إلى تطوير نموذج تنبؤي يعتمد على تقنيات تعلم الآلة لاكتشاف الرسائل النصية المصطنعة التي تُستخدم غالبًا في هجمات احتيالية مثل إساءة استخدام رموز التحقق (OTP). تركز الدراسة على تحليل بيانات حقيقية مكونة من 12,500 رسالة قصيرة، منها 10,000 تم تصنيفها كحركة مضخّمة اصطناعيًا و2,500 رسالة طبيعية. شملت الدراسة استخراج وتحليل عدد من الخصائص (features) المرتبطة بالمرسل، المستلم، ومعلومات الشبكة مثل رمز الدولة (MCC) ومُشغّل الشبكة (MNC)، وتمت معالجة البيانات باستخدام تقنيات مناسبة للتشفير والتنظيف. تم بناء وتقييم أربعة نماذج تصنيف باستخدام خوارزميات تعلم الآلة: Random Forest، Gradient Boosting، Support Vector Machine (SVM)، وGaussian Naive Bayes. وأظهرت النتائج أن نماذج Random Forest وGradient Boosting حققت أعلى أداء، حيث تفوقت في الكشف عن الرسائل الاصطناعية بدقة عالية، بينما أظهرت خوارزميات SVM وNaive Bayes أداءً أقل خاصة في التعامل مع البيانات غير المتوازنة. تقدم هذه الدراسة مساهمة عملية في تعزيز أمان شبكات الاتصالات، من خلال تقديم إطار عمل قابل للتطبيق يُمكّن مزودي الخدمة من التعرف المبكر على التضخم الاصطناعي للرسائل، مما يدعم تحسين الكفاءة التشغيلية وتقليل الفاقد المالي.
530 _aIssues also as CD.
546 _aText in English and abstract in Arabic & English.
650 0 _aMachine learning
650 0 _aتعلم الآلة
653 1 _aMachine Learning
_aFraud Detection
_aSMS
_aArtificial Inflation
_aData Classification
_aPattern Recognition
_a Mobile Networks
_aالتعلم الآلي
_aكشف الاحتيال
700 0 _aAmmar Mohammed Ammar Mohammed
_ethesis advisor.
900 _b01-01-2025
_cAmmar Mohammed Ammar Mohammed
_UCairo University
_FFaculty of Graduate Studies for Statistical Research
_DDepartment of Data Science
905 _aShimaa
_eEman Ghareb
942 _2ddc
_cTH
_e21
_n0
999 _c177999