Improving deep learning techniques using ensemble methods in text classification / by Rania Abd El.Monam kora ; Supervised by Prof. Ammar Mohammed, Dr.Mervat Gheith.

By:

Rania Abd El.Monam kora [preparation.]

Contributor(s):

Material type: Text

TextLanguage: English Summary language: English, Arabic Producer: 2024Description: 163 leaves : illustrations ; 30 cm. + CDContent type:

text

Media type:

Unmediated

Carrier type:

volume

Other title:

/ تحسين تقنيات التعلم العميق باستخدام طرق التجميع في تصنيف النصوص [Added title page title]

Subject(s):

DDC classification:

Available additional physical forms:

Issues also as CD.

Dissertation note: Thesis (Ph.D)-Cairo University, 2024. Summary: Over the last decade deep learning-based models surpasses classical machine learning models in a variety of text classification tasks. The primary challenge with text classification is determining the most appropriate deep learning classifier. Numerous research initiatives incorporated ensemble learning to boost the performance, minimize errors and avoid overfitting. Despite the power of ensemble deep learning system methods in improving prediction performance, most of the ensemble deep learning literature focuses on only applying a majority of voting algorithms to enhance the performance due to its simplicity. However, relying solely on this approach is not a smart method to combine the models because it is biased toward weak models, which can reduce the performance. So, in this thesis makes the following contributions: First, it proposes a new Meta-Ensemble Deep Learning Algorithm that fuses baseline deep learning models using 2-levels of meta-classifiers. Second, it conducts several experiments on eleven benchmark datasets for text classification involving several languages and dialects to test and evaluate the performance of the proposed algorithm. For each benchmark dataset, committees of different deep baseline classifiers are trained, and their best performance is compared with the performance of the proposed algorithm. In particular, we train 314 deep models and perform a comparison on five different shallow meta-classifiers to ensemble those models. Moreover, it extends the results by comparing the performance of the proposed algorithm to other state-of-the-art ensemble methods. Third, it creates the benchmark dataset "Arabic-Egyptian Dataset 2". This dataset is added to the original version known as the "Arabic-Egyptian Dataset". The findings indicate that the proposed algorithm significantly increase the classification accuracy of the baseline deep models on all benchmark datasets (Arabic-Egyptian Dataset, AJGT, ArSarcasm, Saudi Arabia Tweets, ASTD, ArSenTD-LEV, IMDB Review, SemEval, COVID19-Fake, Movie Reviews and Twitter US Airline Sentiment) using soft prediction of 4.1%, 16.03%, 5.2%, 10.58%, 8.37%, 9.1%, 10.44%, 12.05%, 3.55%, 7.52% and 6.55%, respectively. Also, the proposed ensemble algorithm outperforms the state-of-art ensemble methods on the previous benchmark dataset.Summary: على مدار العقد الماضي، تفوقت النماذج القائمة على التعلم العميق على نماذج التعلم الآلي الكلاسيكية في مجموعة متنوعة من مهام تصنيف النص. يعتبر التحدي الأساسي فى مجال تصنيف النص هو تحديد مصنف التعلم العميق الأكثر ملاءمة. لقد قامت العديد من المبادرات البحثية بدمج اسلوب التعلم الجماعي لتعزيز الأداء وتقليل الأخطاء وتجنب overfitting. على الرغم من قوة أساليب نظام التعلم العميق المجمع فى تحسين أداء التنبؤ، فإن معظم مؤلفات التعلم العميق للمجموعة تركز فقط على تطبيق غالبية خوارزميات التصويت لتحسين الأداء ويرجع ذلك بسبب بساطته. ومع ذلك، فإن الاعتماد على هذا الأسلوب فقط ليست طريقة ذكية لدمج النماذج لأنه ينحاز إلى النماذج الضعيفة، مما قد يقلل من الأداء. لذا، تقدم هذه الرسالة المساهمات التالية: أولاً، يقترح نهج Meta-Ensemble Deep Learning Algorithm باستخدام مستويين من المصنفات الوصفية. ثانيًا، يجري العديد من التجارب على إحدى عشرة مجموعة بيانات مرجعية عامة لتصنيف النص الذي يتضمن عدة لغات ولهجات لاختبار وتقييم أداء نهج التعلم العميق للمجموعة الوصفية المقترحة. لكل مجموعة بيانات معيارية، يتم تدريب لجان من مصنفات أساسية عميقة مختلفة ، ويتم مقارنة أفضل أداء لها بأداء الخوارزمية المقترحة. على وجه الخصوص، قمنا بتدريب 314 نموذجًا عميقًا وإجراء مقارنة على خمسة مصنفات تعريفية ضحلة مختلفة لتجميع تلك النماذج. علاوة على ذلك ، تم توسيع النتائج من خلال مقارنة أداء الخوارزمية المقترحة مع أساليب المجموعات الحديثة الأخرى. ثالثًا، تم إنشاء مجموعة البيانات المعيارية Arabic-Egyptian Dataset 2 ، حيث تم إضافة هذه المجموعة إلى الإصدار الأصلي المعروف بإسم Arabic-Egyptian Dataset . تشير النتائج إلى أن الخوارزمية المقترحة تزيد بشكل كبير من دقة التصنيف للنماذج العميقة الأساسية في جميع مجموعات البيانات المعيارية (Arabic-Egyptian Dataset, AJGT, ArSarcasm, Saudi Arabia Tweets, ASTD, ArSenTD-LEV, IMDB Review, SemEval, COVID19-Fake, Movie Reviews and Twitter US Airline Sentiment) بإستخدام soft prediction ( 4.1%, 16.03%, 5.2%, 10.58%, 8.37%, 9.1%, 10.44%, 12.05%, 3.55%, 7.52% 6.55%, )، على التوالى. كما تتفوق الخوارزمية المقترحة على أساليب المجموعة الحديثة.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Home library	Call number	Status	Barcode
Thesis	قاعة الرسائل الجامعية - الدور الاول	المكتبة المركزبة الجديدة - جامعة القاهرة	Cai01.18.02.Ph.D.2024.Ra.I (Browse shelf(Opens below))	Not for loan	01010110090154000

Thesis (Ph.D)-Cairo University, 2024.

Bibliography: pages 127-162.

Over the last decade deep learning-based models surpasses classical machine learning models in a variety of text classification tasks. The primary challenge with text classification is determining the most appropriate deep learning classifier. Numerous research initiatives incorporated ensemble learning to boost the performance, minimize errors and avoid overfitting. Despite the power of ensemble deep learning system methods in improving prediction performance, most of the ensemble deep learning literature focuses on only applying a majority of voting algorithms to enhance the performance due to its simplicity. However, relying solely on this approach is not a smart method to combine the models because it is biased toward weak models, which can reduce the performance. So, in this thesis makes the following contributions: First, it proposes a new Meta-Ensemble Deep Learning Algorithm that fuses baseline deep learning models using 2-levels of meta-classifiers. Second, it conducts several experiments on eleven benchmark datasets for text classification involving several languages and dialects to test and evaluate the performance of the proposed algorithm. For each benchmark dataset, committees of different deep baseline classifiers are trained, and their best performance is compared with the performance of the proposed algorithm. In particular, we train 314 deep models and perform a comparison on five different shallow meta-classifiers to ensemble those models. Moreover, it extends the results by comparing the performance of the proposed algorithm to other state-of-the-art ensemble methods. Third, it creates the benchmark dataset "Arabic-Egyptian Dataset 2". This dataset is added to the original version known as the "Arabic-Egyptian Dataset". The findings indicate that the proposed algorithm significantly increase the classification accuracy of the baseline deep models on all benchmark datasets (Arabic-Egyptian Dataset, AJGT, ArSarcasm, Saudi Arabia Tweets, ASTD, ArSenTD-LEV, IMDB Review, SemEval, COVID19-Fake, Movie Reviews and Twitter US Airline Sentiment) using soft prediction of 4.1%, 16.03%, 5.2%, 10.58%, 8.37%, 9.1%, 10.44%, 12.05%, 3.55%, 7.52% and 6.55%, respectively. Also, the proposed ensemble algorithm outperforms the state-of-art ensemble methods on the previous benchmark dataset.

على مدار العقد الماضي، تفوقت النماذج القائمة على التعلم العميق على نماذج التعلم الآلي الكلاسيكية في مجموعة متنوعة من مهام تصنيف النص. يعتبر التحدي الأساسي فى مجال تصنيف النص هو تحديد مصنف التعلم العميق الأكثر ملاءمة. لقد قامت العديد من المبادرات البحثية بدمج اسلوب التعلم الجماعي لتعزيز الأداء وتقليل الأخطاء وتجنب overfitting. على الرغم من قوة أساليب نظام التعلم العميق المجمع فى تحسين أداء التنبؤ، فإن معظم مؤلفات التعلم العميق للمجموعة تركز فقط على تطبيق غالبية خوارزميات التصويت لتحسين الأداء ويرجع ذلك بسبب بساطته. ومع ذلك، فإن الاعتماد على هذا الأسلوب فقط ليست طريقة ذكية لدمج النماذج لأنه ينحاز إلى النماذج الضعيفة، مما قد يقلل من الأداء. لذا، تقدم هذه الرسالة المساهمات التالية: أولاً، يقترح نهج Meta-Ensemble Deep Learning Algorithm باستخدام مستويين من المصنفات الوصفية. ثانيًا، يجري العديد من التجارب على إحدى عشرة مجموعة بيانات مرجعية عامة لتصنيف النص الذي يتضمن عدة لغات ولهجات لاختبار وتقييم أداء نهج التعلم العميق للمجموعة الوصفية المقترحة. لكل مجموعة بيانات معيارية، يتم تدريب لجان من مصنفات أساسية عميقة مختلفة ، ويتم مقارنة أفضل أداء لها بأداء الخوارزمية المقترحة. على وجه الخصوص، قمنا بتدريب 314 نموذجًا عميقًا وإجراء مقارنة على خمسة مصنفات تعريفية ضحلة مختلفة لتجميع تلك النماذج. علاوة على ذلك ، تم توسيع النتائج من خلال مقارنة أداء الخوارزمية المقترحة مع أساليب المجموعات الحديثة الأخرى. ثالثًا، تم إنشاء مجموعة البيانات المعيارية Arabic-Egyptian Dataset 2 ، حيث تم إضافة هذه المجموعة إلى الإصدار الأصلي المعروف بإسم Arabic-Egyptian Dataset . تشير النتائج إلى أن الخوارزمية المقترحة تزيد بشكل كبير من دقة التصنيف للنماذج العميقة الأساسية في جميع مجموعات البيانات المعيارية (Arabic-Egyptian Dataset, AJGT, ArSarcasm, Saudi Arabia Tweets, ASTD, ArSenTD-LEV, IMDB Review, SemEval, COVID19-Fake, Movie Reviews and Twitter US Airline Sentiment) بإستخدام soft prediction ( 4.1%, 16.03%, 5.2%, 10.58%, 8.37%, 9.1%, 10.44%, 12.05%, 3.55%, 7.52% 6.55%, )، على التوالى. كما تتفوق الخوارزمية المقترحة على أساليب المجموعة الحديثة.

Issues also as CD.

Text in English and abstract in Arabic & English.

There are no comments on this title.

to post a comment.