An enhanced hybrid approach for word segmentation / Mohamed Karam Ali Farag Allah ; Supervised Hesham Ahmed Hefny

By:

Mohamed Karam Ali Farag Allah

Contributor(s):

Hesham Ahmed Hefny []

Material type: Text

TextLanguage: English Publication details: Cairo : Mohamed Karam Ali Farag Allah , 2018Description: 91 Leaves : charts ; 30cmOther title:

أسلوب مختلط محسن لتقسيم الكلمة [Added title page title]

Subject(s):

Available additional physical forms:

Issued also as CD

Dissertation note: Thesis (M.Sc.) - Cairo University - Institute of Statistical Studies and Research - Department of Computer and Information Science Summary: Word segmentation is the process of finding the best likely sequence of words from a sequence of characters without clear delimiters. The main problems of word segmentation methods are ambiguity and the need of a dataset with a big size. Several researches proposed solutions to word segmentations using heuristic techniques. The last techniques task is to hopefully find the best segmentation without searching the entire state spaces. The performance of a word segmentation method can be measured using quantitative measures such as recall, precision and F-measure. There are two main contributions in this research. The first one is proposing a hybrid approach for word segmentation. The second contribution is proposing a GA-based parameter optimization for the word segmentation method. The proposed word segmentation method without optimization is compared to other related work, and it was found that our method can perform better or as same as other methods. Additionally, the results of the method without optimization and the results of the method after optimization are compared, and It was found that the method after parameter optimization achieved better results. To show that the presented approach is domain language independent, the approach is experimented furthermore on the Chinese and Arabic languages. For the Arabic language, a dataset of 10 million words is used. The F-measure result before the optimization is 89.1%

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Home library	Call number	Copy number	Status	Barcode
Thesis	قاعة الرسائل الجامعية - الدور الاول	المكتبة المركزبة الجديدة - جامعة القاهرة	Cai01.18.02.M.Sc.2018.Mo.E (Browse shelf(Opens below))		Not for loan	01010110076202000
CD - Rom	مخـــزن الرســائل الجـــامعية - البدروم	المكتبة المركزبة الجديدة - جامعة القاهرة	Cai01.18.02.M.Sc.2018.Mo.E (Browse shelf(Opens below))	76202.CD	Not for loan	01020110076202000

Thesis (M.Sc.) - Cairo University - Institute of Statistical Studies and Research - Department of Computer and Information Science

Word segmentation is the process of finding the best likely sequence of words from a sequence of characters without clear delimiters. The main problems of word segmentation methods are ambiguity and the need of a dataset with a big size. Several researches proposed solutions to word segmentations using heuristic techniques. The last techniques task is to hopefully find the best segmentation without searching the entire state spaces. The performance of a word segmentation method can be measured using quantitative measures such as recall, precision and F-measure. There are two main contributions in this research. The first one is proposing a hybrid approach for word segmentation. The second contribution is proposing a GA-based parameter optimization for the word segmentation method. The proposed word segmentation method without optimization is compared to other related work, and it was found that our method can perform better or as same as other methods. Additionally, the results of the method without optimization and the results of the method after optimization are compared, and It was found that the method after parameter optimization achieved better results. To show that the presented approach is domain language independent, the approach is experimented furthermore on the Chinese and Arabic languages. For the Arabic language, a dataset of 10 million words is used. The F-measure result before the optimization is 89.1%

Issued also as CD

There are no comments on this title.

to post a comment.