Normal view MARC view ISBD view

Enhancement of mispronunciation detection using deep learning techniques / (Record no. 172706)

MARC details
000 -LEADER
fixed length control field	07232namaa22004331i 4500
003 - CONTROL NUMBER IDENTIFIER
control field	OSt
005 - أخر تعامل مع التسجيلة
control field	20250807111138.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	250623s2024 ua a\|\|\|frm\|\|\| 000 0 eng d
040 ## - CATALOGING SOURCE
Original cataloguing agency	EG-GICUC
Language of cataloging	eng
Transcribing agency	EG-GICUC
Modifying agency	EG-GICUC
Description conventions	rda
041 0# - LANGUAGE CODE
Language code of text/sound track or separate title	eng
Language code of summary or abstract	eng
--	ara
049 ## - Acquisition Source
Acquisition Source	Deposit
082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	006.31
092 ## - LOCALLY ASSIGNED DEWEY CALL NUMBER (OCLC)
Classification number	006.31
Edition number	21
097 ## - Degree
Degree	M.Sc
099 ## - LOCAL FREE-TEXT CALL NUMBER (OCLC)
Local Call Number	Cai01.20.03.M.Sc.2024.Ah.E
100 0# - MAIN ENTRY--PERSONAL NAME
Authority record control number or standard number	Ahmed Ismail Meawed Zahran,
Preparation	preparation.
245 10 - TITLE STATEMENT
Title	Enhancement of mispronunciation detection using deep learning techniques /
Statement of responsibility, etc.	by Ahmed Ismail Meawed Zahran ; Supervision Prof. Aly Aly Fahmy, Prof. Khaled Wassif, Dr. Hanaa Mobarez.
246 15 - VARYING FORM OF TITLE
Title proper/short title	تحسين أساليب كشف أخطاء النطق باستخدام التعلم العميق
264 #0 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE
Date of production, publication, distribution, manufacture, or copyright notice	2024.
300 ## - PHYSICAL DESCRIPTION
Extent	111 leaves :
Other physical details	illustrations ;
Dimensions	30 cm. +
Accompanying material	CD.
336 ## - CONTENT TYPE
Content type term	text
Source	rda content
337 ## - MEDIA TYPE
Media type term	Unmediated
Source	rdamedia
338 ## - CARRIER TYPE
Carrier type term	volume
Source	rdacarrier
502 ## - DISSERTATION NOTE
Dissertation note	Thesis (M.Sc)-Cairo University, 2024.
504 ## - BIBLIOGRAPHY, ETC. NOTE
Bibliography, etc. note	Bibliography: pages 104-111.
520 #3 - SUMMARY, ETC.
Summary, etc.	In language learning applications, pronunciation assessment models are a necessary component for providing feedback on a user’s pronunciation skills. Pronunciation scoring literature has been largely dependent on feature-based models like Goodness-of-Pronunciation (GOP) and deep-learning based speech recognition. In the past few years, transformer-based self-supervised learning (SSL) has enabled the introduction of large pre-trained models that can be used to produce powerful contextualized speech representations, which has shown improvement in several downstream tasks. We propose End-to-End Regressor (E2E-R), an end-to-end model for pronunciation scoring that is built through fine-tuning a pre-trained SSL model. E2E-R is developed using a two-stage approach. In the first stage, a pre-trained SSL model is fine-tuned on a phoneme recognition task, which results in a model that can produce accurate phoneme vector representations. In the second stage, a pronunciation scoring model is built using transfer learning. This model utilizes a Siamese neural network to compare pronounced phoneme representations with embeddings that represent the correct pronunciation of canonical phonemes. The result of the comparison is used as the pronunciation score. Experimental results show that our proposed model achieves a Pearson correlation coefficient (PCC) of 0.68 on the speechocean762 dataset, which is almost the same as the PCC achieved by the state-of-the-art GOPT (PAII-A), without the need for additional native speech data, feature engineering, or an external forced alignment module. To the best of our knowledge, this work represents the first utilization of a pre-trained SSL model in end-to-end phoneme-level pronunciation scoring.
520 #3 - SUMMARY, ETC.
Summary, etc.	تعد نماذج تقييم النطق من العناصر الضرورية في تطبيقات تعلم اللغة، حيث أنها تساعد في تقديم الملاحظات حول مهارات النطق لدى المستخدمين. تعتمد الأبحاث العلمية في مجال تقييم النطق إلى حد كبير على النماذج القائمة على السمات (features) مثل نموذج جودة النطق (GOP)، كما تعتمد أيضا على تقنيات التعرف على الكلام القائمة على التعلم العميق. وفي السنوات القليلة الماضية، سمحت تقنية التعلم الخاضع للإشراف الذاتي باستخدام نماذج المحولات (transformers) بإتاحة نماذج مدربة مسبقًا ذات حجم كبير، ويمكن استخدام هذه النماذج لإنتاج تمثيلات دقيقة للكلام، حيث أن هذه التمثيلات تعتمد في تشكيلها على سياق الكلام. ساعدت هذه التمثيلات بدورها في تحسين العديد من تطبيقات تكنولوجيا التعرف على الكلام. في هذه الرسالة، نقوم بطرح نموذج End-to-End Regressor (E2E-R)، وهو نموذج شامل (end-to-end) لتقييم النطق تم إنشاؤه عن طريق الضبط الدقيق لنموذج مُدرب مسبقًا بتقنية التعلم الخاضع للإشراف الذاتي. يمكننا تقسيم طريقة بناء نموذج E2E-R إلى مرحلتين. في المرحلة الأولى، يُستخدم الضبط الدقيق لضبط نموذج مُدرب مسبقًا عن طريق التعلم الخاضع للإشراف الذاتي على مهمة التعرف على الوحدات الصوتية، مما يترتب عليه إيجاد نموذج بإمكانه إنتاج تمثيلات دقيقة للوحدات الصوتية. في المرحلة الثانية، يتم بناء نموذج لتقييم النطق باستخدام تقنية نقل التعلم. يستخدم هذا النموذج شبكة عصبية سيامية (Siamese Neural Network) لمقارنة التمثيلات الصوتية التي تمثل الوحدات الصوتية التي نطقها المستخدم مع تضمينات (embeddings) تمثل النطق الصحيح للوحدات الصوتية التي كان ينبغي للمستخدم أن ينطقها. تُستخدم درجة التماثل الناتجة عن هذه المقارنة كتقييم لنطق المستخدم. تظهر لنا النتائج التجريبية أن النموذج المطروح يحقق معامل ارتباط بيرسون (PCC) يبلغ ٠,٦٨ عند اختباره باستخدام مجموعة بيانات speechocean762، وهو تقريبًا نفس معامل ارتباط بيرسون الذي يحققه نموذج GOPT (PAII-A)، والذي يعد الأفضل في تقييم النطق في الوقت الحالي، وهذا دون الحاجة إلى جمع مجموعات من البيانات تحتوي علي بيانات صوتية باللسان الأم للمستخدمين، أوالإستعانة بهندسة السمات، أو استخدام نماذج إضافية للحصول على مواقع بدايات ونهايات الوحدات الصوتية. على حد معرفتنا، يشكل هذا العمل أول استخدام لنموذج مُدرب مسبقًا بتقنية التعلم الخاضع للإشراف الذاتي في التقييم الشامل للنطق على مستوى الوحدات الصوتية.
530 ## - ADDITIONAL PHYSICAL FORM AVAILABLE NOTE
Issues CD	Issues also as CD.
546 ## - LANGUAGE NOTE
Text Language	Text in English and abstract in Arabic & English.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	Deep Learning
653 #1 - INDEX TERM--UNCONTROLLED
Uncontrolled term	Automatic pronunciation assessment
--	pronunciation scoring
--	pre-trained speech representations
--	self-supervised speech representation learning
700 0# - ADDED ENTRY--PERSONAL NAME
Personal name	Aly Aly Fahmy
Relator term	thesis advisor.
700 0# - ADDED ENTRY--PERSONAL NAME
Personal name	Khaled Wassif
Relator term	thesis advisor.
700 0# - ADDED ENTRY--PERSONAL NAME
Personal name	Hanaa Mobarez
Relator term	thesis advisor.
900 ## - Thesis Information
Grant date	01-01-2024
Supervisory body	Aly Aly Fahmy
--	Khaled Wassif
--	Hanaa Mobarez
Universities	Cairo University
Faculties	Faculty of Computers and Artificial Intelligence
Department	Department of Computer Science
905 ## - Cataloger and Reviser Names
Cataloger Name	Shimaa
Reviser Names	Eman Ghareb
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme	Dewey Decimal Classification
Koha item type	Thesis
Edition	21
Suppress in OPAC	No

Holdings
Source of classification or shelving scheme	Home library	Current library	Date acquired	Inventory number	Full call number	Barcode	Date last seen	Effective from	Koha item type
Dewey Decimal Classification	المكتبة المركزبة الجديدة - جامعة القاهرة	قاعة الرسائل الجامعية - الدور الاول	23.06.2025	91685	Cai01.20.03.M.Sc.2024.Ah.E	01010110091685000	23.06.2025	23.06.2025	Thesis

جامعة القاهرة

المكتبة المركزية الجديدة

مكتبة جامعة القاهرة الأهلية

Enhancement of mispronunciation detection using deep learning techniques / (Record no. 172706)