header
Local cover image
Local cover image
Image from OpenLibrary

Arabic document layout analysis using machine learning and connected components based features / Rana Sobhy Mostafa Saad ; Supervised Neamt Sayed Abdelkader , Samia Abdelrazeq Mashaly

By: Contributor(s): Material type: TextTextLanguage: English Publication details: Cairo : Rana Sobhy Mostafa Saad , 2018Description: 122 P. : charts , facsimiles ; 30cmOther title:
  • تحليل هيئة الوثائق العربية باستخدام تعلم الآلة و سمات المكونات المترابطة [Added title page title]
Subject(s): Online resources: Available additional physical forms:
  • Issued also as CD
Dissertation note: Thesis (M.Sc.) - Cairo University - Faculty of Engineering - Department of Electronics and Communications Summary: Document Layout Analysis (DLA) is a key preprocessing stage for optical character recognition (OCR). It locates and defines text and non-text regions of a document image. Arabic DLA is less addressed compared to other languages due to the lack of appropriate publicly available research datasets.A full pipeline of DLA procedure is composed of several stages: Input document Preprocessing, Document Physical layout Analysis (PLA), Document Logical Layout Analysis (LLA), and document analysis output representation. In this thesis, CCs geometric features are used to represent the Arabic document images These CCs features are classified by means of Support Vector Machines (SVM) and Random Forests (RF) classifiers into text and non-text components to perform PLA for scanned Arabic book pages. Experiments on BCE-v1, and other researcher's datasets showed remarkable performance of both the SVM and RF based solutions. Comparing to other classical and state-of-the-art systems showed much strength to the proposed system and promise further application to wider problem domains
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Item type Current library Home library Call number Copy number Status Barcode
Thesis Thesis قاعة الرسائل الجامعية - الدور الاول المكتبة المركزبة الجديدة - جامعة القاهرة Cai01.13.08.M.Sc.2018.Ra.A (Browse shelf(Opens below)) Not for loan 01010110077523000
CD - Rom CD - Rom مخـــزن الرســائل الجـــامعية - البدروم المكتبة المركزبة الجديدة - جامعة القاهرة Cai01.13.08.M.Sc.2018.Ra.A (Browse shelf(Opens below)) 77523.CD Not for loan 01020110077523000

Thesis (M.Sc.) - Cairo University - Faculty of Engineering - Department of Electronics and Communications

Document Layout Analysis (DLA) is a key preprocessing stage for optical character recognition (OCR). It locates and defines text and non-text regions of a document image. Arabic DLA is less addressed compared to other languages due to the lack of appropriate publicly available research datasets.A full pipeline of DLA procedure is composed of several stages: Input document Preprocessing, Document Physical layout Analysis (PLA), Document Logical Layout Analysis (LLA), and document analysis output representation. In this thesis, CCs geometric features are used to represent the Arabic document images These CCs features are classified by means of Support Vector Machines (SVM) and Random Forests (RF) classifiers into text and non-text components to perform PLA for scanned Arabic book pages. Experiments on BCE-v1, and other researcher's datasets showed remarkable performance of both the SVM and RF based solutions. Comparing to other classical and state-of-the-art systems showed much strength to the proposed system and promise further application to wider problem domains

Issued also as CD

There are no comments on this title.

to post a comment.

Click on an image to view it in the image viewer

Local cover image