Mohamed Salem Mohamed Elhady

Advanced techniques in speaker diarization for arabic TV brpadcast / تقنيات متقدمة في فصل المتحدثين في البث التلفزيوني العربي Mohamed Salem Mohamed Elhady ; Supervised Mohsen Abdelrazeq Rashwan , Sehrif Mahdy Abdou - Cairo : Mohamed Salem Mohamed Elhady , 2017 - 79 P. : charts , facsimiles ; 30cm

Thesis (M.Sc.) - Cairo University - Faculty of Engineering - Department of Electronics and Communications

Speaker Diarization is known as the task that answers the question, who spoke, when in an audio le or a set of audio les that contain unknown number of speakers. The determination of speaker segments is done in an unsupervised manner. Our Speaker Diarization system composed of two main blocks; Speech Activity Detector and Speaker Clustering. In speech activity detection we propose several solutions including; Phoneme Recognition system, SVMHMM system and i-vector based system. In speaker clustering area we propose an enhancement over state of the art techniques as cosine based Hierarchal Agglomerative Clustering. Such enhancement including enhancing clustering by classication methods as SVM, DNN and Random Forrest. Finally we investigated enhancing the i-vector representation via extracting them from a DNN based background model

Subjects--Index Terms: Machine Learning Speaker Diarization Speech Processing