An enhanced hybrid approach for word segmentation /
Mohamed Karam Ali Farag Allah
An enhanced hybrid approach for word segmentation / أسلوب مختلط محسن لتقسيم الكلمة Mohamed Karam Ali Farag Allah ; Supervised Hesham Ahmed Hefny - Cairo : Mohamed Karam Ali Farag Allah , 2018 - 91 Leaves : charts ; 30cm
Thesis (M.Sc.) - Cairo University - Institute of Statistical Studies and Research - Department of Computer and Information Science
Word segmentation is the process of finding the best likely sequence of words from a sequence of characters without clear delimiters. The main problems of word segmentation methods are ambiguity and the need of a dataset with a big size. Several researches proposed solutions to word segmentations using heuristic techniques. The last techniques task is to hopefully find the best segmentation without searching the entire state spaces. The performance of a word segmentation method can be measured using quantitative measures such as recall, precision and F-measure. There are two main contributions in this research. The first one is proposing a hybrid approach for word segmentation. The second contribution is proposing a GA-based parameter optimization for the word segmentation method. The proposed word segmentation method without optimization is compared to other related work, and it was found that our method can perform better or as same as other methods. Additionally, the results of the method without optimization and the results of the method after optimization are compared, and It was found that the method after parameter optimization achieved better results. To show that the presented approach is domain language independent, the approach is experimented furthermore on the Chinese and Arabic languages. For the Arabic language, a dataset of 10 million words is used. The F-measure result before the optimization is 89.1%
An enhanced hybrid approach Genetic Algorithms Word segmentation
An enhanced hybrid approach for word segmentation / أسلوب مختلط محسن لتقسيم الكلمة Mohamed Karam Ali Farag Allah ; Supervised Hesham Ahmed Hefny - Cairo : Mohamed Karam Ali Farag Allah , 2018 - 91 Leaves : charts ; 30cm
Thesis (M.Sc.) - Cairo University - Institute of Statistical Studies and Research - Department of Computer and Information Science
Word segmentation is the process of finding the best likely sequence of words from a sequence of characters without clear delimiters. The main problems of word segmentation methods are ambiguity and the need of a dataset with a big size. Several researches proposed solutions to word segmentations using heuristic techniques. The last techniques task is to hopefully find the best segmentation without searching the entire state spaces. The performance of a word segmentation method can be measured using quantitative measures such as recall, precision and F-measure. There are two main contributions in this research. The first one is proposing a hybrid approach for word segmentation. The second contribution is proposing a GA-based parameter optimization for the word segmentation method. The proposed word segmentation method without optimization is compared to other related work, and it was found that our method can perform better or as same as other methods. Additionally, the results of the method without optimization and the results of the method after optimization are compared, and It was found that the method after parameter optimization achieved better results. To show that the presented approach is domain language independent, the approach is experimented furthermore on the Chinese and Arabic languages. For the Arabic language, a dataset of 10 million words is used. The F-measure result before the optimization is 89.1%
An enhanced hybrid approach Genetic Algorithms Word segmentation