Normal view MARC view ISBD view

A contribution on Arabic text summarization using deep learning / (Record no. 169637)

MARC details
000 -LEADER
fixed length control field	09406namaa22004091i 4500
003 - CONTROL NUMBER IDENTIFIER
control field	OSt
005 - أخر تعامل مع التسجيلة
control field	20250102142928.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	241222s2023 \|\|\|a\|\|\|f m\|\|\| 000 0 eng d
040 ## - CATALOGING SOURCE
Original cataloguing agency	EG-GICUC
Language of cataloging	eng
Transcribing agency	EG-GICUC
Modifying agency	EG-GICUC
Description conventions	rda
041 0# - LANGUAGE CODE
Language code of text/sound track or separate title	eng
Language code of summary or abstract	eng
--	ara
049 ## - Acquisition Source
Acquisition Source	Deposit
082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	004
092 ## - LOCALLY ASSIGNED DEWEY CALL NUMBER (OCLC)
Classification number	004
Edition number	21
097 ## - Degree
Degree	Ph.D
099 ## - LOCAL FREE-TEXT CALL NUMBER (OCLC)
Local Call Number	Cai01.12.02.Ph.D.2023.As.C
100 0# - MAIN ENTRY--PERSONAL NAME
Authority record control number or standard number	Asmaa Elsaid Mohamed Elsayed,
Preparation	preparation.
245 12 - TITLE STATEMENT
Title	A contribution on Arabic text summarization using deep learning /
Statement of responsibility, etc.	by Asmaa Elsaid Mohamed Elsayed ; Supervised by Prof. Lamiaa Fattouh Ibrahim, Prof. Ammar Mohammed Ammar.
246 15 - VARYING FORM OF TITLE
Title proper/short title	/ المساهمة في تلخيص النصوص العربيه باستخدام التعلم العميق
264 #0 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE
Date of production, publication, distribution, manufacture, or copyright notice	2023.
300 ## - PHYSICAL DESCRIPTION
Extent	111 leaves :
Other physical details	illustrations ;
Dimensions	30 cm. +
Accompanying material	CD.
336 ## - CONTENT TYPE
Content type term	text
Source	rda content
337 ## - MEDIA TYPE
Media type term	Unmediated
Source	rdamedia
338 ## - CARRIER TYPE
Carrier type term	volume
Source	rdacarrier
502 ## - DISSERTATION NOTE
Dissertation note	Thesis (Ph.D)-Cairo University, 2023.
504 ## - BIBLIOGRAPHY, ETC. NOTE
Bibliography, etc. note	Bibliography: pages 95-111.
520 ## - SUMMARY, ETC.
Summary, etc.	Text summarization is essential in natural language processing as the data volume <br/>increases quickly. Daily textual data's increasing volume and complexity, including <br/>social media posts, news articles, emails, and text messages, make consuming and <br/>processing all the information difficult. This also means manually sifting through large <br/>volumes of text to find relevant information, which can be time-consuming and difficult. <br/>Therefore, the user needs to summarize that data into meaningful text quickly. Text <br/>summarizing addresses this challenge by automatically condensing text into a more <br/>concise format so that users can quickly and easily access the most critical information. <br/>In today's data-driven environment, it has become an essential tool used in multiple <br/>contexts, including text analysis, content-based recommendations, and information <br/>retrieval. <br/> There are three standard methods of text summarization: extractive, abstractive, and <br/>hybrid. There are many efforts to summarize Latin texts. However, summarizing Arabic <br/>texts is challenging for many reasons, including the language's complexity, structure, <br/>and morphology. Also, there is a need for benchmark data sources and a gold-standard <br/>Arabic evaluation metrics summary. <br/> Thus, the contribution of this thesis is multifold. First, it proposes a hybrid approach <br/>consisting of a modified sequence-to-sequence (MSTS) algorithm and a transformer-<br/>based model. Adding multi-layer encoders and a one-layer decoder to the structure of <br/>the sequence-to-sequence-based model changes it. The output of the MSTS model is an <br/>extractive summarization. To generate the abstractive summarization, a transformer-<br/>based model manipulated the extractive summarization. Second, it introduces a dataset <br/>with long text: a new Arabic benchmark dataset called the Hybrid Arabic text <br/>summarization dataset (HASD), which includes 43k articles with their extractive and <br/>abstractive summaries. Third, this work modifies the well-known extractive Essex <br/>Arabic summaries corpus (EASC) benchmarks by adding to each text its abstractive <br/>summarization. Fourth, an evaluation measure called eval-summ determines the <br/>accurate summary from multiple summarizations. This measure aids in identifying the<br/>most precise summary of the various summarizations produced by the model. Fifth, this <br/>thesis proposes a new measure called the Arabic-rouge measure for the abstractive <br/>summary, depending on structure and similarity between words. Finally, an <br/>investigation of the impact of using abstractive Arabic text summarization on different <br/>transformer models with other datasets. <br/> The model is tested using the proposed HASD and modified EASC benchmarks and <br/>evaluated using Rouge, Bleu, and Arabic Rouge. <br/>The experimental results on the EASC extractive dataset reveal that the Rouge-1, <br/>Rouge-2, Rouge-L, and Bleu scores are 65, 56, 63, and 42, respectively. For the <br/>proposed HASD extractive dataset, the rouge-1, rouge-2, Rouge-L, and bleu scores are <br/>82.54, 79.03, 81.51, and 44, respectively. For the benchmark EASC abstractive dataset, <br/>Rouge1, Rouge2, Rouge-L, Bleu, and Arabic-Rouge were 64, 49, 61, 42, and 68, while <br/>for the HASD dataset, they were 76, 62, 75, 44, and 82, which gives satisfactory results <br/>compared to the known literature results.
520 ## - SUMMARY, ETC.
Summary, etc.	يعد تلخيص النصوص أمرًا ضروريًا في معالجة اللغة الطبيعية حيث يزداد حجم البيانات بسرعة. إن الحجم والتعقيد المتزايد للبيانات النصية اليومية، بما في ذلك منشورات وسائل التواصل الاجتماعي والمقالات الإخبارية ورسائل البريد الإلكتروني والرسائل النصية، يجعل استهلاك جميع المعلومات ومعالجتها أمرًا صعبًا. وهذا يعني أيضًا البحث اليدوي في حجوم كبيرة من النص للعثور على المعلومات ذات الصلة، مما يمكن أن يكون مستهلكًا للوقت وصعبًا.. ولذلك، يحتاج المستخدم إلى تلخيص تلك البيانات في نص ذي معنى مفهوم في وقت قليل. <br/>يعالج تلخيص النص علي مواجهه هذا التحدي عن طريق تكثيف النص تلقائيًا في تنسيق أكثر إيجازًا حتى يتمكن المستخدمون من الوصول بسرعة وسهولة إلى المعلومات الأكثر أهمية. في بيئة البيانات الحديثة التي تعتمد على البيانات، أصبحت أداة أساسية تُستخدم في سياقات متعددة، بما في ذلك تحليل النصوص وتوصيات قائمة على المحتوى واسترجاع المعلومات. <br/> هناك طريقتان قياسيتان لتلخيص النص: الاستخراجية والتجريدية. هناك العديد من الجهود لتلخيص النصوص اللاتينية. ومع ذلك، فإن تلخيص النصوص العربية يمثل تحديًا لأسباب عديدة ، يرجع ذلك الي قله الموارد (ادوات البرمجه اللغويه العصبيه ) والمرادفات واللغويات اللغه العربيه بما في ذلك تعقيد اللغة وهياكلها وصرفها وهناك أيضًا حاجة إلى مصادر بيانات مرجعية وملخص لمقاييس التقييم العربية المعيارية. وبالتالي، فإن مساهمة هذه الدراسه متعددة الجوانب. أولاً، يقترح نهجًا هجينًا يتكون من خوارزمية تسلسل إلى تسلسل معدلة (MSTS) ونموذج قائم على المحولات. تؤدي إضافة أجهزة تشفير متعددة الطبقات ووحدة فك ترميز أحادية الطبقة إلى بنية النموذج القائم على تسلسل إلى تسلسل إلى تغييره. إن مخرجات نموذج MSTS عبارة عن تلخيص استخراجي. لتوليد التلخيص التجريدي، نموذج قائم على المحول يعالج التلخيص الاستخلاصي. ثانيًا، يُقدم مجموعة بيانات جديدة تحتوي على نصوص طويلة: مجموعة بيانات قياسية عربية جديدة تسمى HASD، تتضمن 43,000 مقالة مع تلخيصاتها الاستخلاصية والتجريدية. <br/>ثالثًا، يعدل هذا العمل معايير EASC الاستخراجية المعروفة عن طريق إضافة تلخيص تجريدي لكل نص. رابعًا، يحدد مقياس التقييم المسمى eval_summary التلخيص الدقيق من بين التلخيصات المتعددة. يساعد هذا المقياس في تحديد التلخيص الأكثر دقة من بين التلخيصات المتعددة التي تنتجها النماذج. <br/>خامسًا، تُقترح في هذه الدراسه مقياسًا جديدًا يُسمى مقياس الروج العربي للتلخيص التجريدي، وذلك استنادًا إلى الهيكل والتشابه بين الكلمات. أخيرًا، دراسة تأثير استخدام تلخيص النص العربي التجريدي على نماذج محولات مختلفة مع مجموعات بيانات أخرى. تم اختبار النموذج باستخدام معايير HASD المقترحة ومعايير EASC المعدلة وتقييمها باستخدام Rouge وBleu وArabic Rouge. تظهر النتائج التجريبية نتائج مرضية مقارنة بأحدث الأساليب.
530 ## - ADDITIONAL PHYSICAL FORM AVAILABLE NOTE
Issues CD	Issues also as CD.
546 ## - LANGUAGE NOTE
Text Language	Text in English and abstract in Arabic & English.
650 #7 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	Computer Science
Source of heading or term	qrmak
653 #0 - INDEX TERM--UNCONTROLLED
Uncontrolled term	Natural Language Processing
--	Deep Learning
--	Arabic Text Summarization
700 0# - ADDED ENTRY--PERSONAL NAME
Personal name	Lamiaa Fattouh Ibrahim
Relator term	thesis advisor.
700 0# - ADDED ENTRY--PERSONAL NAME
Personal name	Ammar Mohammed Ammar
Relator term	thesis advisor.
900 ## - Thesis Information
Grant date	01-01-2023
Supervisory body	Lamiaa Fattouh Ibrahim
--	Ammar Mohammed Ammar
Universities	Cairo University
Faculties	Faculty of Graduate Studies for Statistical Research
Department	Department of Computer Science
905 ## - Cataloger and Reviser Names
Cataloger Name	Shimaa
Reviser Names	Huda
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme	Dewey Decimal Classification
Koha item type	Thesis
Edition	21
Suppress in OPAC	No

Holdings
Source of classification or shelving scheme	Home library	Current library	Date acquired	Inventory number	Full call number	Barcode	Date last seen	Effective from	Koha item type
Dewey Decimal Classification	المكتبة المركزبة الجديدة - جامعة القاهرة	قاعة الرسائل الجامعية - الدور الاول	22.12.2024	89557	Cai01.12.02.Ph.D.2023.As.C	01010110089557000	22.12.2024	22.12.2024	Thesis